The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 
curriculum, learning, teaching, supervision, measurements, 
statistics, and experimental techniques. 








Volume XIV ; SEPTEMBER, 1945 | 





THE MEASUREMENT OF TEACHING ABILITY 


CONTENTS 


Some Introductory Comments: A. S. Barr 
Study Number One: L. E. Rostker 
Study Number Two: J. F. Rolfe 


Study Number Three: C. V. La Duke 








$5.00 A YEAR PUBLISHED QUARTERLY $1.50 A Copy 








Published by Dembar Publications, Inc., 114 §. Carroll Street, 
Madison 3, Wisconsin 
Entered as second-class matter October 17, 1938 at the post office at Madison, Wisconsin, 
under the act of March 3, 1879. 





EDITORIAL BOARD 


A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


H. H. Remmers, Professor of Educational chology, 
Director Di 


supervision, pub- 


J. Pig Wrightstone, Assistant Director, Bureau of Ref- 

Research and Statistics, Board of Education of 

the be City of of New York, 110 Livingston Street, Brooklyn, 

New Editorialiy responsible for materials on 
pwn le oy ‘construction, published each June. 


Palmer O. Johnson, Professor of Education, University of 
Minnesota, Minneapolis, Minnesota. Editorially re- 
sponsible for materials on measurements, statistics, and 
methods of experimental research, published each March. 

Arthur T. Jersild, Professor of Education, Teachers Col- 
lege, Columbia University, New York City. Editorially 
responsible for materials on child welfare, guidance, 
and development, published each December. 


CONTRIBUTING EDITORS 


a 8 A. Betts, Research Professor and Director of 
poe Pennsylvania State College, State Col- 
gg Be 8 
tod L. Betts, yy! of Graduate Research in Ed- 
ucation, Colorado State College, “ort Collins, Colorado. 
Leo J. Brueckner, Professor of ~~ pa University of 
Minnesota, Minneapolis, 

Oscar K. Buros, Associate 0 sce of Education, Rut- 

gers University, New Brunswick, New Jersey. 


~~. T. ge Professor of Educational Psychology, 
niversity of Chicago, Chicago, Illinois. 


Otis W. Caldwell, General Secretary, The American Asso- 
ciation for the Advancement of Science, Boyce Thom 
son Institute for Plant Research, Yonkers, New York. 

Leslie L. Chisholm, Associate Professor of Education, 
a College of Washington, Pullman, Washington. 


Herbert S. Com, College Entrance Examination Board, 
Princeton, N. J. 


Stephen M. C , Professor of Educational Psychology, 

niversity of Chicago, Chicago, Illinois. 
Edward E. Cureton, saat Civilian | Personne! Research 
Subsection, The ‘Adjutan General’s Office, War De- 
» 162 aston oe ly Larchmont, 


New 


eau 
Ider, Colorado. 
Hari R. Douglass, Director of College of Education, Uni- 
versity of Colorado, , Colorado. 
Jack W. Dunlap, Associate Professor of Educational 
, University of Rochester, Rochester, New Y 
Harold A. Edgerton, Director, 
Service, Ohio State University, —— Ohi 
La wy C. Flanagen, Chief, Psycholdgical Branch Of- 
ce of Air } oe. Headquarters, Army Air Forces, 


Robert A. Davis, Professor of Education, Dicer of Bur- 
Educational Research, 1 of 





unities 


Carter V. Good, Professor of Education, Acting Dean, 
School of Education, University of Cincinnati, Cin- 
cinnati, Ohio. 


Robert W. B. Jackson, Assistant Professor of Educational 
Research; Assistant Director, ment of Educa- 
tional Research, Ontario College of Education, Univer- 
sity of Toronto, Toronto, Canada. 


Harold E. Jones, Professor of Psychology and Director, 
Institute of Child Welfare, University of California, 
Berkeley 4, California. 


Noel Keys, Professor of Education and 
Human Rela‘ 


Lecturer in 
tions, University of California, Berkeley, 
California. 


D. Welty Lefever, Professor of Education, University of 
Southern California, Los Angeles, California. 

Edward A. conan Consulting Psychologist, Halifax, 
Massachuse 


ny ioe Associate Professor of Education, Teachers 
: Columbia i tue New York City. 


A. R. Mead, Director of Educational Research, Univer- 
a" 5 =. 330 P. K. Yonge Building, Gainesville, 
ori 


T. E. a Lt. Comdr., USNR, 2702 Wisconsia Ave- 
nue, N. W., Washington "”, D. ¢. 


C W. Odell, Associate nw aga of Education, University 
on ‘Minois, Urbana, Ilin 


Willard C. Olson, Professor of Education, Director of Re- 


search in C Development, University of Michigan, 
Ann Arbor, Michigan. 


W. E. Peik, Dean and Professor of Education, Univer- 
sity of Minnesota, Minneapolis, Minnesota. 


S. L. Pressey, Professor of Educational Psychology, Ohio 
State University, Columbus, Ohio. 

Clarence E. Ragsdale, Professor of Education, University 
of Wisconsin, Madison, Wisconsin. 

William Reitz, Assistant Professor of College 


Education, 
of Education Examiner, Chief Statistician Air Cargo 
Research, Wayne University, Detroit 2, Michigan. 


Henry D. Rinsland, Professor of Education and Director 
of Educational Research, The University of " 
Norman, Oklahoma. 


Robert T. Rock, Jr., ——— of Psychology, Head of 
Dept. of Psychology, te School, Fordham Uni- 
New York 


Phillip J. Rulon, inane of Education, Harvard Uni- 
versity, Cambridge 38, Massachusetts.. 


versity, 


E. Scates, Associate ie of Education, 


Douglas 
Duke University, Durham, 


David Segel, Educational Consultant, Specialist in Tests 
and Measurements, Federal cone, Agency, U. S. Of- 
fice of Education, Washington, D. C. 


Paul W. Terry, Professor of cor heeear Psychology, Uni- 
versity of Alabama. University, Alabama. 


Helen Thompson, Clinic of Chil? Development, - ah 
Associate, Yale University, New Haven, Connecticu 


Robert L. Thorndike, Associate Professor of oadideh 

Teachers College, Columbia University, New York City. 
Herbert A. Toops, Professor of Psychology, Ohio State 
University, Columbus, Ohio. 


Maurice E. Troyer, Director, Bureau of School Services, 
Syracuse University, Syracuse 10, New York. 


Helen M. Walker, Professor of Education, Teachers Col- 
lege, Columbia University, New York City. 


ve L. Wellman, Professor of Psychology, Child Wel- 
= — ch Station, State University of Iowa, lowa 
ty, Iowa 


Guy M. Wilson, Professor of Education, Emeritus, Boston 
niversity, Boston, ean yp 
Paul A. Witty, Professor Education, Director of 
Psycho-Educational Clinic, Senco of Education, North- 
western University, Evanston, Illinois. 


Ernest R. Wood, Professor of Education, New York Uni- 
versity, New York City. 


DEMOCRAT PRINTING COMPANY 
. MADISON, WISCONSIN 











Journal of Experimental Education 








Volume XIV 


SEPTEMBER, 1945 


Number 1 








SOME INTRODUCTORY COMMENTS 


A. S. Barr 
University of Wisconsin 


The research herein reported is the product 
of the efforts of many persons working to- 
gether cooperatively. The project was initi- 
ated during the academic year of 1934-35. 
Throughout this year a group of some twenty 
= representing the State Department of 

blic Instruction; the Department of Edu- 
cation, University of Wisconsin; and a num- 
ber of local school systems, met every other 
Saturday morning to discuss possible ap- 
proaches to a systematic study of the problem 
of teacher evaluation. The result of this year’s 
work was a statement of the problem and a 
tentative formulation of the procedure to be 
followed in this study. During the academic 
year of 1935-36, this tentative formulation 
was turned over to a graduate seminar in 
Education for their critical analysis and re- 
vision: during the first semester of this year 
the procedure was rewritten in light of ideas 
gained from a careful survey of previous in- 
vestigations; during the second semester the 
procedure was tried out in a preliminary in- 
vestigation and again rewritten. The proce- 
dure as then formulated provided the general 
pattern for the series of investigations here 
reported. The data for these investigations 
were collected for the most part during the 
academic years of 1936-38. The statistical 
analysis of the data consumed the greater 
part of the period intervening from 1938 and 
1944. The report here presented is a collection 
of the several studies arising from the plan 
thus projected.’ 


1The authors are grateful for much help received from 
many different persons. First of all, the authors wish to 
acknowledge the many hours of valuable time given by the 
ho, as the report will show, par- 

investigation. Then, too, the 

the large amount of assistance 


in the course of this study. Par- 

ld be made of A. G. Hellfritzsch who 

given the several contributors to this report much r 
assistance, and who has read the manuscript in its —*. 
Dr. M. H. Willing, Professor of Education; Dr. V. A. C. 
Henmon, Professor of Psychology, and Dr. C. J. Anderson, 


THE PROBLEM 


To select, guide, and educate teachers as 
effectively as we should we must know much 
more than we do now about the prerequisites 
to teaching efficiency and how to identify and 
describe these prerequisites accurately. The 
purpose of this investigation, or series of in- 
vestigations, is to study these prerequisites, 
their inter-relatedness, and the validity of the 
instruments commonly employed in collecting 
data about them. More specifically an attempt 
is made to answer the following questions: 
(1) What are the prerequisites to teaching 
efficiency, particularly for teachers of the 
social studies in the 7th and 8th grades of 
Wisconsin rural schools? (2) How valid and 
reliable are certain of the instruments com- 
monly employed in measuring teacher effici- 
ency and its prerequisites? And (3) how do 
the prerequisites to teaching efficiency, as 
measured in this investigation, seem to be 
interrelated? Besides these more specific pur- 
poses, it is hoped that this investigation may 
throw some light, too, upon the general nature 
and organization of human abilities. 


There are many important problems in the 
field of teacher evaluation not here studied. 
Those enumerated here seem most appropriate 
to the conditions under which the investiga- 
tion was conducted. 


THE IMPORTANCE OF THE PROBLEM 


There are approximately one and a quarter 
million teachers in this country who teach 
some thirty million pupils. The schools in 
which these pupils and teachers work con- 
stitute one of the country’s most extensive 
enterprises and in a very real sense supply the 


Dean of the School of Education, have given valuable edi- 
torial help. The full list of those who have assisted is too 
long to reproduce here. The authors wish to acknowledge too 
the greatly ted financial aid given by the Works 
Progress Administration and the Graduate Research Committee 
f Wisconsin, both of whom made sizeable 

the support of this project. The assistance 

of all of these is most gratefully acknowledged. 





2 JOURNAL OF EXPERIMENTAL EDUCATION 


foundations for the democratic order for 
which we all strive. Next to the pupil, the 
teacher is the most important single factor in 
this great enterprise. She is the central im- 
pelling force in our educational effort. With- 
out good teachers there cannot be good 
schools. 

To get good iesehoes there must be wise 
selection and guidance, good preparation, and 
sound employment and placement practices. 
The education of teachers should be predi- 
cated upon discriminating selection. But to 
guide and select wisely, one must have accu- 
rate knowledge of the prerequisites to teach- 
ing efficiency and possess the means of identi- 
fying these prerequisites in a trustworthy 
fashion. The effective education of teachers, 
both before and after they enter service, de- 
pends in a very real sense upon our ability 
to identify progress in attaining teaching effi- 
+ ciency and its prerequisites. Inefficiency in 
evaluation leads to inefficiency in teacher edu- 
cation. Only by knowing the results of our 
efforts to educate teachers may we improve 
the process. Not only do those responsible for 
teacher selection, guidance, and education 
need more precise information relative to 
teaching efficiency, but administrators and 
placement officials, too, need this information 
for effective employment assignment and pro- 
motion practices. The basis upon which these 
responsibilities are now discharged by many 
officials are scarcely worthy of the profession. 
The fair treatment of teachers and pupils are 
likewise involved in this problem as is the 
quality of the service rendered by schools. 
There is already available considerable evi- 
dence, subjective and objective, to indicate 
that current methods of evaluating teaching 
efficiency are inadequate. The teacher, the 
professional educator, the administrator, the 
pupils and the public would all profit by 
better measures of teaching efficiency. 


THe GENERAL PLAN OF THE INVESTIGATION 


It appeared from past experience with 
similar studies that the purposes of this in- 
vestigation might be best served by the use 
of a semi-controlled statistical technique of 
research applied under normal classroom con- 
ditions. The application of this technique as 
here used involved the following steps: (1) 
the definition of the task to be performed; 
ie., the changes to be sought in the pupils; 
(2) the measurement of pupil growth by the 


[Vol. 14, No.1 


application of certain instruments of measure- 
ment before and after teaching; (3) the sys- 
tematic measurement and control of factors 
which seem to condition pupil growth; (4) 
the definition and measurement of certain 
teaching factors chosen for study, and (5) the 
systematic study of the relationships between 
these factors and teaching efficiency as herein 
defined. 

One of the important phases of this study 
is its attempt to define teaching efficiency in 
terms of its effects. This idea is not new, not 
even in the field of professional education,} 
but further effort in this direction seemed 


desirable. Much confusion has arisen in the 
field of human evaluation through failure 
to establish a definite point of departure. 
Notable success has been achieved in certain 
of the physical sciences by defining very elu- 
sive phenomena through their effects. The 
same thought has here been applied. 


THE TEACHERS AND PUPILS STUDIED 


The investigations here reported were car- 
ried out in the main with seventh- and eighth- 
grade teachers of citizenship in non-depart- 
mentalized one- and two-room rural schools 
in the state of Wisconsin. Several of the 
studies employed supplementary data from 
other sources. The main body of data for the 
investigation, however, consists of measures 
of three groups of teachers and pupils: (1) a 
group of 24 teachers teaching in state graded 
schools with 342 pupils; (2) a group of 47 
teachers teaching in one-room rural schools 
with 338 pupils; (3) a group of 31 teachers 
with 18: pupils in one- and two-room schools. 
One group of 24 teachers with 194 pupils 
was investigated in a follow-up study extend- 
ing through a period of two years. In several 
instances data were secured relative to other 
teachers and pupils to supplement those 
secured with respect to the main group of 
teachers here studied. Descriptions of these 
and other teachers and pupils will be found 
in the reports to follow. 


THE CRITERION OF TEACHING EFFICIENCY 


The principal criterion of teaching efficiency 
employed in this investigation was a com- 
posite of a number of measures of pupil 
growth and achievement. In certain of the 
studies, composites of the scores on teacher 


1William H. ae ‘ B Barr, Gilbert L. Betts and 
others, “The Meas Teaching Efficiency” (New 
York: The MacMillan Co.. Co., ‘1955). 


citize 
invest 
begin 
three 
cours 





“ Pe e3., talus ed Lb 


7 


September, 1945] bie a SOME INTRODUCTORY COMMENTS» 3. 


rating-scales and composites of measures of 
certain qualities commonly associated with 
teaching efficiency constituted other criteria 
of efficiency. Considerable care was taken 
in the development of the criterion. For the 
primary criterion both unit and overall year- 
t tests were employed. The overall tests ¥ 
were given at the beginning and at the end 
of the school year, approximately six months 
apart. The overall tests were chosen to relate 
to the more general purposes of the course in 
citizenship taught by the teachers here under 
investigation. The unit tests were given at the 
beginning and end of two standard tasks, each 
three weeks in length, chosen from the regular 
course in citizenship and defined in terms of 
the accepted objectives of the course. Each 
task was presumably of equal difficulty and 
applied to the instruction of groups of pupils 
of equal capacity, under comparable condi- 
tions. Finally, certain measures employed for 
equating purposes were administered to the 
pupils. The measures used in the equating of 
pupil-groups were those of intelligence, read- 
ing ability, and other factors thought to be 
related to pupil growth and achievement. 


In determining the contribution of each 
teacher to the total learning-teaching situa- 
tion, the growth or achievement of the pupils 
under her direction was determined by sub- 
tracting the initial test scores of her pupils 
from their final test scores. In this fashion a 
pupil change-score was secured for each class. 
The part of each gain-score attributable to 
the effort of the teacher was considered to 
be the difference between the gain-scores 
secured from the application of the tests to 
the pupils at the beginning and end of the 
experimental period, and the score predicted 
for each pupil from measures of the pupils’ 
intelligence and other factors thought to influ- 
ence achievement. The residual pupil gain was 
attributed to the teacher and other uncon- 
trolled factors. The uncontrolled factors were, 
for the purpose of this study, assumed to be 
randomly distributed. The statistical proce- 
dures employed in making these calculations 
and in the development of the several criterion 
of teaching efficiency will be described in the 
special reports to follow. 

It is probably unnecessary to say that the 
investigators considered the several criteria of 
teaching efficiency employed in this investiga- 
tion a very important matter and gave it very 
careful consideration. The worth of all that 


follows depend in a very real measure upon 
the adequacy of the criteria of teaching effi- 
ciency employed. While no attempt has been 
made to measure all of the worthwhile out- 
comes of education, it is hoped that those 
chosen for study in this investigation will illus- 
trate some of the more important problems 
arising from attempts at the systematic evalu- 
ation of teaching efficiency. 


DEFINING THE TASK 


The tasks to be performed by the teachers 
and pupils participating in this investigation 
were defined in terms of unit and course 
objectives. 


To provide a satisfactory point of departure 
for the investigation an attempt was made to 
explore the more general purposes of citizen- 
ship instruction. This exploratory study in- 
volved analyses of expert opinion as found in 
current statements’ of the purposes of citizen- 
ship education and as revealed in two special 
investigations® in this area. The main source 
of opinion was found in articles on the teach- 
ing of the social studies, each of which was 
carefully studied for expert statements of the 
purposes of education. The objectives for both 
unit and course assignments were carefully 
defined and made known to the teachers par- 
ticipating in this investigation. 

Considerable time was taken in the word- 
ing of objectives. There appear to be two 
quite different forms in which the purposes 
of education have been stated: (1) state- 
ments in terms of pupil behavior, and (2) 
statements in terms of controls over behavior. 
These latter seem to be of two sorts: (1) 
traits, qualities, and characteristics of indi- 
viduals such as honesty, open-mindedness, 
consideratness, etc., and (2) the mental pre- 
requisites to successful performance (be- 
havior), such as knowledges, skills, attitudes, 
ideas, interests, and appreciations. Each of 
these forms has its own peculiar advantages 
and disadvantages, educational and psycho- 
logical. A composite approach was employed, 
as an examination of the statement of objec- 
tives of the work in citizenship will indicate. 


a — A. Beard, The Nature of the Social Sciences, Part 
Report of the Commission on the Social Studies of the 
Anmcrican Historical Association (New York: Charles Scrib- 
ner’s Sons, 1934). 
* Edith Herrin Cocke, A Study of the Objective in Ke 
ing American Government _ (Madison, 


— of —— 1936 
‘Objective of Gonat 


barton J. Rooms, fhe Phe Civics, 
Wis.: University of Wisconsin, 


130), 





JOURNAL OF EXPERIMENTAL EDUCATION 


TEACHER AND PUPIL MEASURES 


Various measures were applied to both 
teachers and pupils. In choosing pupil meas- 
ures an attempt was made to select those that 
measured not only knowledge but attitudes, 
skills, and behavior as well. While the number 
of measures varied from study to study, some 
twenty-three measures, mostly tests, were 
applied to the pupils altogether. To measure 
some of the less tangible outcomes of citizen- 
ship training, the tests available in this area, 
such as, for example, the Wrightstone tests, 
were supplemented by especially constructed 
questionnaires and rating-scales. The original 
plan included behavior records for each pupil, 
but these.somehow became lost in the process. 
As one looks back over the several measures 
employed and the data collected, one can only 
regret that they were not more adequate. 


“Some twenty-five different measures were 
applied to the teachers, including measures of 
the teachers’ knowledge of the subject matter 
of the courses taught, intelligence, socio- 
economic status, skill in expression, personal 
fitness, social adjustment, emotional stability, 
teacher-pupil relationships, leadership, and 
interest in teaching. While a very large num- 
ber of measures were applied to the teacher, 
the list of qualities measured was by no means 
complete, and in many areas the measurés 
were quite inadequate. It seemed not feasible, 
for example, to measure the teachers’ health, 
energy, and drive. No data seem to have been 
collected, either, with reference to the general 
cultural attainment of the teachers included 
in the study. There are, undoubtedly, many 
other important aspects of teaching ability 
not here measured. 


Other information collected, with reference 
to certain of the teachers, included two sound 
records of lessons taught by each of twenty- 
five teachers; these records constituted a 
major source of data for a detailed activity- 
analysis of teaching. Samples of the work of 
pupils and the teachers’ teaching outlines were 
also collected. All teachers were given, too, the 
unit pretests given the pupils, and each 
teacher filled out a detailed information blank. 
For a listing and a description of the measures 
employed in the several investigations here 
reported, the reader is referred to the indi- 
vidual reports to follow. 


[Vol. 14, No.1 


THE COLLECTION OF DATA 


The data were collected for the most part 
by advanced graduate students pursuing work 
for the doctor’s degree in education at the 
University of Wisconsin. All of the teachers 
were visited one or more times, most of them 
many times. In the first investigation the 
tests taken by the pupils were administered 
by the teacher; in the later investigations 
the pupil tests were administered by trained 
investigators. With a few exceptions the 
tests were administered to the teachers, ; 
individually or in small groups, by trained 
investigators. Care was taken to make the 
data as accurate as possible. More detailed 
information relative to the methods of collect- 
ing the data will be given in the special reports 
to follow. 


Tue ContTrROL or Factors AFFECTING THE 
OUTCOMES OF INVESTIGATION 


While this investigation was of the field study 
sort, it seemed important to the investigators 
that the factors conditioning pupil growth and 
achievement be controlled as carefully as pos- 
sible. To do this the following precautions 
were taken: 


1. The subjects employed in the investiga- 
tion were chosen from definite types of schools 
and homes, namely, 7th- and 8th-grade pupils 
enrolled in one-room rural and village state- 
graded schools, all engaged in the study of 
citizenship. Each sample was tested for 
homogeneity, and carefully described. 


2. The teachers chosen for investigation 
were all selected from non-departmentalized 
schools. The investigators could discover no 
easy way of discounting the effects of other 
teachers upon pupils in departmentalized 
schools. 

3. An attempt was made to equalize the 
amount of time devoted to the study of citi- 
zenship. Each day’s work was to consist of 
forty minutes in one class period or two 
periods of twenty minutes each. There was to 
be no home work. Where there were field trips 
and other activities these were to be employed 
in such a manner as not to increase the total 
time given to the work in this field. 

4. Realizing that the equipment with which 
teachers would work might vary considerably, 
the investigators attempted to supply each 
teacher, upon request, with supplementary 





September, 1945] 


reading materials, equipment, and visual aids. 
A list of these materials was supplied each 
teacher. 

5. An attempt was made to equate pupil 
ability through the use of pre-tests in the sub- 
ject matter areas, tests of intelligence, 
measures of socio-economic status, and read- 
ing ability. 


THE STATISTICAL TREATMENT OF THE DATA 


A variety of statistical procedures have 
been employed in the treatment of the data 
collected. The study is, however, principally 
a correlation study supplemented and enriched 
by many other techniques such as case studies, 
regression techniques, and factor analysis. In 
general, an attempt has been made to employ 
up-to-date procedures appropriate to the task 
at hand. The detailed procedures employed in 
this investigation will be described in the 
special reports to follow. 


SOME INTRODUCTORY COMMENTS 


SPECIAL INVESTIGATIONS INCLUDED IN 
Tuts REPORT 


Within the general framework here de- 
scribed seven special investigations were made 
as follows: 
1. The Measurement of Teaching Ability 
(first investigation). L. E. Rostker. 

2. The Measurement of Teaching Ability 
(second investigation). J. E. Rolfe. 

3. The Measurement of Teaching Ability 
(third investigation). C. V. LaDuke. 

. A Study of the Relationship Between 
Teaching Procedures and Educational 
Outcomes. C. D. Jayne. 

. The Improvement of Teaching Efficiency 
Through Supervision. C. R. Von Eschen. 

. Personality and Teaching Efficiency. 
R. E. Gotham. 

. A Factor Analysis of Teacher Abilities. 
A. G. Hellfritzsch. 





THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER ONE 


L. E. ROSTKER 
New York City 


STATEMENT OF THE PROBLEM 


The central problem of this study was to 
determine the relationship between selected 
teacher measures as applied to 7th- and 8th- 
grade teachers of the social studies in non- 
departmentalized rural and village schools, 
and the changes produced by these teachers 
in their pupils. Differently stated, the purpose 
of this study was to determine the validity of 
selected teacher-measuring devices when vali- 
dated against the criterion of pupil changes. 
Another problem of this study was to deter- 
mine what combination of teacher measures 
give the highest correlation with teaching 
ability as measured by the criterion of pupil 
changes. 


SECTION I 


THE CRITERION OF TEACHING 
ABILITY 


The criterion of teaching ability accepted 


“\. for this study was the measurable changes 


produced in pupils by their teachers. 

There has been considerable discussion as 
to the advisability of using measurable pupil 
changes as a criterion of teaching ability. 
Pittinger' pointed out that pupil achievement 
is not the result of any single teacher’s effort 
but rather the resultant efforts of a number 
of teachers. Symonds? was of the opinion that 
since pupils came to their teachers with vary- 
ing degrees of intelligence and amounts of 
preparation, classroom achievement could be 
no valid measure of teaching ability until 
better professional tests were available. Fritz* 
concluded that the use of standardized tests 
measured chiefly the memorization of factual 
materials. Shannon‘ objected to the use of 

2B. F. Pittinger, “Problems of Teacher Measurement”’, 
— of Educational Psychology, VIII (1917), . 103— 

a Percival M. svpent, , “The Measurement of Teaching 
Efficiency XIE (193 Educational Administration and 

con 1), PP, 217-231. 


Superoision, XII 
Sucmat Ba Prediction of Probable Teaching 
4 Bawoutiotal p Bs and Supervision, XX 


(1889), bp 133-140. 

of Teachers”, The National Elementary a. Bulletin 
of the Department of Elementary School Principals, XVI, 
No. 6 (July, 1937), pp. 524-529. 


“Difficulties in Estimating the Efficiency 


pupil changes as the criterion of teaching 
ability because the tests available did not 
measure all pupil aspects. . 

The use of measurable pupil changes as the 
criterion of teaching ability has, on the other 
hand, found many champions. Courtis® stated 
his position by saying: “The only position I 
am willing to accept must be in terms of 
changes in the pupils taught.” Corey® sug- 
gested that if a large number of teachers were 
employed and experiments were carefully con- 
trolled, measurable pupil changes could be 
used as the criterion of teaching ability. 
Trow’ stated that since “the task of the 
teacher is . . . to assist the pupils to learn 
. . . it is on the basis of the extent to which 
pupils do learn that teaching should be 
judged.” 

For this study the assumption was made 
that a good teacher is one who produced 
desirable measurable changes in her pupils. 
Since a teacher is engaged to teach and to 
modify the behavior of her pupils, the degree 
to which changes are produced in her pupils 
is a reflection of the ability of the teacher. 

In accepting measurable pupil changes as 
the criterion of teaching ability, a number of 
assumptions were made. First, that there are 
certain factors associated with pupil perform- 
ance, such as intelligence, reading ability, and 
socio-economic status, which affect pupil per- 
formance, and which factors vary in degree 
from pupil to pupil. It is necessary to elimi- 
nate the varying effects of these factors so 
that whatever pupil changes occur can be 
attributed to the teacher rather than to the 
influence of pupil factors. Secondly, it would 
be difficult to assume identical curricula in 
the number of different classes used in this 
study. Nor is the identity of curricula desir- 
able if teaching is to provide for individual 
differences. The teachers participating in this 

S. A. Courtis, “The Measurement of the Efficiency of 
Educational Administration and Supervision, 
XVI (1932), pp. 401-412. 


* Stephen M. Com, “The Present State of Ignorance About 
Factors Affi Success”, Educational Administra- 
tion and yam Ba q . 

™Wm. C. Trow, “How Shall T Be Evaluated?”’, 
Educational Administration and Supervision, XX (1934), pp. 
264-272 





September, 1945| THE MEASUREMENT OF TEACHING ABILITY 7 


study were, therefore, told that they would 
be expected to teach two three-week units of 
work, one in the fall of the year, the other in 
the following spring. For these units, the 
teachers were given the desired general objec- 
tives and broad topical outlines and told that 
they could teach whatever subject matter they 
chose, providing that the materials chosen fell 
within the limits of these units. The assump- 
tion was made that a good teacher would 
wisely choose her materials of instruction and 
that the changes made by her pupils would 
tend to indicate whether her choice had been 
wise. The same position was taken with refer- 
ence to methods of instruction. These must 
also be chosen with regard to the varying 
capacities, aims, needs, etc., of the pupils in 
each class. There can be no uniform method 
of teaching if individual differences among 
pupils are to be recognized, promoted, and 
preserved. Each teacher was expected to ad- 
just her methods to her pupils. A good teacher, 
then, is one who wisely and carefully chooses 
her methods of instruction. 

It cannot, however, be over-emphasized 
that the measurable pupil changes obtained in 
this study are limited by the type of tests 
applied to the pupils. The use of pupil changes 
as the criterion of teaching ability, depends 
upon the tests applied to the pupils and what- 
ever implications are to be drawn must be 
limited by the tests employed. 


EXPERIMENTAL PATTERN 


The data for this study were obtained dur- 
ing the school year 1936-37 from 28 seventh- 
and eighth-grade classes offering citizenship, 
in non-departmentalized schools. 

To overcome the criticism that measurable 
pupil changes are the result of the efforts of 
a number of teachers, non-departmentalized 
schools were used. In this way, the measur- 
able changes in pupils for a given school year 
could be attributed to a single teacher. This 
was a restriction rigidly enforced when schools 
were asked to cooperate in this study. Conse- 
quently, the number of eligible schools was 
markedly restricted. 

The initial proposal for this study called 
for a representative sample of approximately 
100 non-departmentalized, eighth-grade 
classes, but when attempts were made to carry 
out this plan, it was realized that such a pro- 
gram, because of financial and clerical diffi- 
culties, would be impossible. It was then nec- 


essary to limit the number of participating 
schools to those within a reasonable traveling 
distance from the center of operations, namely 
Madison, Wisconsin. 

The selection of the seventh and eighth 
grades, as the level on which data were to be 
collected, was purely arbitrary. 


A number of schools meeting the above re- 
quirements, as ascertained from the Official 
School Directory of the state of Wisconsin, 
were visited immediately after the opening of 
the schoo] term in the Fall of 1936, and the 
eligible teachers were informed that partici- 
pation in this study was entirely voluntary 
on their part; that there was no compulsion 
of any sort to be applied to them for partici- 
pation, and that the results obtained from 
them and their classes would be treated pri- 
vately and confidentially without transmitting 
such data to their principals and supervisors.* 
The response of the teachers was overwhelm- 
ingly in favor of participation, and a group of 
28 schools located in southern Wisconsin’ 
with each school having at least one eighth 
grade, and a total pupil population of 375, was 
selected for participation in this study. , 

The plan proposed and executed was: (1) 
to measure pupil performance near the begin- 
ning of the school year and near the close of 
the school year so as to obtain long-time pupil 
changes occurring over approximately six 
months; (2) to measure pupil performance 
just prior to the teaching of and immediately 
after the teaching of two three-week units of 
work in the general field of citizenship,—one 
of these units to be given in the fall of the 
school year, the other unit in the spring of 
the same school year,—so that pupil changes 
on two short-units of work would be obtained; 
and (3) various measures and rating-scales 
were applied to the teachers, preferably in the 
fall of the school year, with the exception of 
those tests taken by both teachers and pupils 
which would be given concurrently. 

Several weeks before any teaching of the 
units occurred, each teacher was sent a letter 
in which were stated the dates when testing 
was to begin, a statement of the topics, and 
the general objectives in terms of desired 
goals for which the teachers, teaching the first 


®Mr. L. H. Mathews shared the responsibility of visiting 
and procuring a large number of the participating schools, of 
administering pupil and teacher tests, and of correcting the 
tests obtained from the schools with which he worked. 

® The location of the 28 schools is 
of the state of Wisconsin 


shown on 

( B_ Plate sD. (G 
Thesis on file at the University Library, University of 
consin.) 





8 JOURNAL OF EXPERIMENTAL EDUCATION 


unit, “Safeguarding Public Health”, were to 
strive.?° 
The topics to be included in the first unit 


. Securing pure air, food, and sunshine. 

. Disposing of wastes. 

. Providing desirable housing. 

. Caring for the physically and mentally 
sick, 

. Recreational opportunities. 


The goals toward which the teachers were 
to direct their instruction for this unit were: 
(1) “to acquire the kinds and amounts of 
information essential to the understanding of 
the problems and issues involved in safeguard- 
ing public health”; (2) “to develop skill in 
forming judgments about this subject”; (3) 
“to develop desirable attitudes relative to safe- 
guarding public health”; and (4) “to lead the 
pupils, individually and cooperatively, to some 
positive action relative to safeguarding public 
health”. 

Before any teaching was begun on the first 
of the two units, each pupil included in this 
study took the Kuhlmann—Anderson Intelli- 
gence Test, the Traxler Silent Reading Test, 
the Sims Socio-Economic Score Card, and 
a battery of tests consisting of the three 
Wrightstone tests and the three Hill tests.** 
At the same time a battery of tests was 
applied to the teachers. 

Several days later, usually at the beginning 
of a school week, the Health test designed to 
measure the work of the first unit, “Safe- 
guarding Public Health”, was given to pupils 
and teachers. 

Teaching on the first unit then continued 
for 13 successive schoo] days and on the 15th 
day, the same test given at the beginning of 
the unit as the pre-test, was again admin- 
istered to the pupils 

Following the close of the first unit, several 
months intervened in which the teachers re- 
sumed their normal course of study. In the 
Spring of the same school year (early March, 
1937), the participating teachers were in- 
formed that on certain dates they would be 
requested to begin teaching a three-week unit 
on “Community Planning”’. 

As in dealing with the first unit, the par- 
ticipating teachers were sent a list of the 

» See Appendix C (Original Thesis on file at the Univer. 
ef Library, al ee Wisconsin) for instructions and 


ada _ rr 
described in the following section. 


[Vol. 14, No.1 


topics to be covered and the desired goals of 
instructions to be achieved.** 

The topics covered in this unit were: (1) 
layout of streets; (2) building zones; (3) 
beautifying the community; (4) keeping the 
community clean; and (5) recreational facil- 
ities. 

The goals of instruction sought for this unit 
were the same as those sought in the first unit 


with the exception of differences in subject 
matter. 


Following the same procedure as in the 
Health unit, the Community Planning test, 
designed to cover the work and goals of the 
second unit was applied to both teachers and 
pupils. The teachers then taught this unit, 
employing whatever methods, materials, sub- 
ject matter, etc., they saw fit to use, for 13 
successive school days. On the 15th day, the 
same test, used at the beginning of this unit 
as an initial test, was again administered to 
the pupils. 

About two weeks after the final test on the 
second unit had been applied to the pupils, 
the pupils were given the same battery of 
tests, three Wrightstone and three Hill tests, 
which they took in the preceding fall. In this 
way, pre- and final-test results, as a longtime 
measure of change, were obtained from the 
pupils. 

As has been remarked, a battery of tests 
was applied to the teachers in the fall of the 
school year. This battery consisted primarily 
of those tests for which there were definite 
time limits to be observed; the teachers were 
requested to complete the other tests at their 
leisure since there were too many tests to be 
taken in a relatively short period of time. 
Later in the year, the same teachers were 
rated by their principals or supervisors on a 
battery of three rating scales; these teachers 
were also rated by the investigators, using 
this same battery. 

The following testing schedule of one of the 
participating schools will illustrate the 
sequence of test administration: 


Sept. 29, 1936—Mailed letter to teacher 
stating topics and desired for the 
unit, Sajeguarding Public Health, and 
Breet 2 teacher that on Oct. 13 and 14 

on file at the Univer- 
instructions and 


C (Original Thesis 
ry Uaiverty of Wisconsin) for 
ny Libary submitted to the teachers. 


18 Mr. Lee Mathews and the author rated those teachers 
with whom each was working. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 9 


a battery of tests would be given to her 
pupils; teaching of the unit to begin on 
Oct. 19. 

Oct. 13-14, 1936—Battery, of Kuhlmann— 
Anderson Intelligence Test, Traxler Silent 
Reading Test, Sims Socio-Economic 
Score Card, Wrightstone and Hill tests 
applied to pupils by investigator; a num- 
ber of teacher tests also applied at the 
same time. 


Oct. 19, 1936—Health unit test applied by 
investigator to pupils and teacher; this 
test used as pre-test for the first unit. 

Oct. 20-Nov. g (inc.) 1936—Teacher 
taught unit on Health. (Wisconsin State 
Teachers Convention met on Nov. 5, 6, 
and 7 so that no classes were held on 
Nov. 5 and 6). 

Nov. 10, 1936— Investigator repeated 
Health unit test to pupils; this test now 
used as measure of final test status. 

Nov. 23, 1936—Received principal’s rating 
of her eighth-grade teacher on battery of 
three rating scales. 


Mar. 1, 1937—Sent teacher a list of topics 
and goals for the second unit, Commu- 
nity Planning, and advised her that pre- 
test would be given on Mar. 12, and final 
test on Apr. 2. 


Mar. 12, 1937—Investigator applied to 
pupils and teacher the Community Plan- 
ning test as the pre-test for the second 
unit. 

Mar. 15-April 2, 1937—Teacher taught 
second unit, Community Planning. (No 
school on Good Friday, Mar. 26). 

April 2, 1937—Investigator gave unit test 
on Community Planning to pupils. Test 
used as measure of final status for this 
unit. 

April 19, 1937—Battery of Wrightstone 
and Hill tests given to pupils as final test 
to measure long-time changes. 


For each school, the time schedule was 
carefully controlled so that the periods be- 
tween the units and the total time period 
between the initial testing and final testing, 
on the battery consisting of the ‘“rightstone 
and Hill tests, was constant and equal to 
approximately six months. Since all of these 
tests had to be administered by two investi- 
gators, it was necessary to start the testing 
program in these schools at different intervals. 


The first group started on October 13; the 
second group started on October 21, and a 
third group was started on November 18. 

The units employed were selected because 
work of a similar nature was usually included 
in eighth-grade classes and because the work 
of these units was suggested in the state course 
of study. Teachers were urged to build these 
units as they would any other teaching units 
and not give them extraordinary preparation. 

In the original plan of this study, the deci- 
sion had been made that no teacher should be 
permitted to administer to her pupils any of- 
the tests used. It was decided that the admin- 
istration of tests would be only by the prin- 
cipals or supervisors of the participating 
schools or by the experimenter. Because of 
the lack of sufficient clerical aid, it was found 
necessary, however, to permit teachers, in a 
number of instances, to administer the tests 
to their pupils. The bulk of the testing was, 
nevertheless, not administered by the teach- 
ers. In every case, the number of tests admin- 
istered was carefully checked and collected by 
the experimenter as soon as possible, so that 
no test would remain in the teacher’s posses- 
sion for any length of time. No teacher cor- 
rected any of the tests administered either to 
her or to her pupils. The tests were corrected 
by the two investigators, and test results, giv- 
ing each pupil’s raw score and upper and 
lower quartiles, mean, median, and sigma for 
each test for each class, were submitted to the 
proper teachers. Each teacher was also sent 
the upper and lower quartile points and 
median for each test for the whole pupil group 
used in this study so that if interested, a 
teacher could ascertain how her group com- 
pared to the total pupil population. 

Each class was visited at least five times 
during the course of this study; a large num- 
ber of schools was visited as many as ten and 
twelve times. This was considered to be of 
importance especially since the investigators 
also rated the participating teachers with the 
same teacher rating scales as used by the 
principals or supervisors. 

The design of this study had been carefully 
considered so that no advantage to any class 
might result which would tend to invalidate 
the conclusions made from the data collected. 


PARTICIPATING TEACHERS 


The teachers participating in this study 
varied in a number of respects. From Table I, 
it is perceived that 17 teachers were women 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE I 
INFORMATION ABOUT THE TEACHERS PARTICIPATING IN THE STUDY 


Grades Grades Yrs. in 


Total Type Additional Additional 


Age 
(Sept. Taught Usedin Present Teaching Institution Profession- Experience 
1936) (Sept. Study Location Experience — which al Work 


1936) 
26 1-8 
7-8 

8 


oo 
— 


3-3 

eae t kw 
00 00 
ot 


| 02000000 00000000 


oi} i 
~ 
oo 


| oo} 


oo 
Cm OI IR 0000 COI “100 


| | 
~ 


CINeQ ad ahadad  aedaiadad 
Cac GC mcm BMacmdc 


4 3 
Co=3 | CO! Co} 
oo oe 
(opey 
AAASOYKMNMNAGH 


M 
F 
F 
F 
M 
F 
M 
F 
F 
M 
F 
M 
M 
M 
F 
M 
F 
F 
M 
F 
M 
F 
M 
F 
F 
F 
F 
F 


COCO“IGO COCO COCO OOOO 6H | COCO mM OOM mM 


XXVIII 


7 
oo 


and 11 were men. The chronological age range 
of this group was 24 years to 54 years with a 
median age of 33 years and an average age 
of 35.4 years. 

The range of total teaching experience was 
from 2 to 31 years with the median at 11 
years and the average at 12.2 years. The 


raduated 


2 Farming 
St. Norm. _.--- _..-- Clerking 
St. Norm. 
H.S. 
ay eam 
St. N Se ee a 
St. Construction 
St. Clerking 
St. ’ 
St.N 
St.) 7 
St.) ; 
St.N . University P.O. Clerk 
St. ih ai hadleemaits teal : ed 
St. EE 
St. S35. _... Clerking 
PITRE. “Sctannecde 
St. Norm. 
St. Norm. University 


University -- 
St. Norm. 


Farming ¢@. 
. Bookkeeper 
Salesman 
St. Norm. University - a dian 
University 
St. Norm. 
St. Norm. University - ey 
College ras Clerking 


: ack 


median number of years these teachers had 
taught in the participating schools was 6 years 
while the average was 6.6 years. About five- 
sevenths of this group of teachers were teach- 
ing combined 7th and 8th grades. 

With few exceptions, this group had gradu- 
ated from state normal schools, and several 


TABLE II 


NUMBER OF PUPILS PARTICIPATING IN STUDY BY SCHOOL AND GRADE 


No. of 
Grade Pupils 


GO GO GO 60 G0 G0 GO GO GO GO 00 GO GO GO 


No. of 
School Grade Pupils 


© G0 =3 GO GO 00 G0 GO GO G0 © GO GO GO 


id bid bid bd bel bd bd bd bd bd bd bt bd bl Bd bl bt 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY II 


had continued their education by taking addi- 
tional work at the University level. Over half 
of this group indicated that at one time they 
had received incomes from sources other than 
teaching; in a few cases, several of this group 
had, to some degree, been self-supporting 
while in their periods of professional training. 
The median teacher participating in this 
study can be described as a woman about 33 
years old, teaching a combined 7th- and 8th- 
grade class, who had graduated from a three 
year normal school course and who had been 
teaching a total of 11 years, of which the last 
six years had been spent in the participating 
school. She may have had some additional 
experience besides teaching, but had not done 
much to further her own formal education 
after graduating from the normal school. 


PARTICIPATING PUPILS 


A total of 375 pupils participated in this 
study (Table I1).** The range of the number 
of pupils per class was from 6 to 35 with a 
median of 11 and an average of 13.4. It is 

4 The original group of 400 was reduced to 375 because 


of absences in class work and because of transfer to other 
schools. 


obvious that these classes were small, but this 
was due to the fact that the schools used were 
town and district schools since most of the 
larger schools in southern Wisconsin were 
departmentalized. 

Unfortunately, the pupils used in Schools 
X and XVII comprised the combined 7th and 
8th grades in these schools. This was neces- 
sary because the same curriculum in the social 
studies was taught to both of these grades. 
In School XXVI, the 7th-grade curriculum 
was similar to the 8th-grade social studies 
curriculum of the other schools used in this 
experiment. Initially, these schools (X, XVII, 
and XXVI) were intended for final inclusion 
with the others on the condition that there be 
no significant differences in the abilities of 
7th- and 8th-grade pupils to gain on the 
measures used in this study. This condition 
was subsequently found to be true.** 


% The difference between the mean raw score gain of 49 
seventh grade pupils and 326 eighth 9 ng pupils on a com- 
posite of unit, PWrightstone and Hill was 4.03 and the 
sums of squares was 10735.27 and 125828 81 respecti 
The value of ¢ corresponding to this difference =~ 1.85 
which is less than 1.96 the .05 value for ¢. See R. A. Fisher, 
“Statistical Methods for Research Workers” ry ” edition, 
go Scotland: Oliver and Boyd 1936), Section 24.1, 
PP. 


TABLE III 
INFORMATION ABOUT THE SCHOOLS PARTICIPATING IN THE STUDY 


Population 
of Town 
1930 Census 


Principal 
Occupation 
of Region 


Type of 
School 


Number 
_- 


Total 
Elem. 
Sch. 


Total Total 
H.S. of Pupils 
Pupils in Bldg. 


XXIII__- 
XXIV... 
XXV__.- 
XXVI-_-. 
XXVII-_. 
XXVIII 


Agriculture 
Agriculture ___- 
Agric; Mining 
Agriculture 


Agric; Ry. Shop --_-_- 
Manufacturing -___- 


Agriculture 
Manufacturing 
Agriculture 
Agriculture 
Agric; Manuf 
Agriculture 
Agriculture 
Agriculture 
Agric; Mining 
anuf; Commer... 

Agriculture 


+ nuf; Commer... 


Elementary 
Elem; HS 

Elem; HS 

Elem; HS 

Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elem; HS 

Elementary 
Elementary 
Elementary 
Elementary 


Elementary 
Elem; HS 
Elementary 
Elementary 
Elem; HS 





JOURNAL OF EXPERIMENTAL EDUCATION 


ScHOOL INFORMATION 


The schools in this experiment, as indicated 
in Table III, represented both urban and rural 
areas. A number of these schools, VI, VIII, 
XI, XVI, XTX, XXIV, XXVII XXVI, and 
were located in what were formerly prosperous 
mining areas; the rest of the schools were 
located in farm and dairy regions. 

The median school could be described as 
an elementary school enrolling 125 pupils 
with a staff of 4 teachers, located in a town 
of about 790 inhabitants, in a rural region of 
southern Wisconsin. 


SECTION II 


DESCRIPTION OF PUPIL AND 
TEACHER MEASURES 


Pupit TESTs 


The purpose of this section is to describe 
the measuring instruments applied to the 
pupils and teachers participating in this 
experiment.*® 

(1) The intelligence of each pupil used in 
this study was determined by application of 
the Kuhlmann—Anderson Intelligence Test, 
Fourth Edition, Grades VII-VIII, 1933." 
This group test consists of ten scaled sub- 
tests with the scores on these sub-tests ex- 
pressed as mental ages. The median of these 
mental ages gives the total mental age for 
each pupil which, when divided by the appro- 
priate chronological age, gives the pupil’s 
intelligence quotient. 

The authors of this test, deciding that the 
methods commonly used to determine the reli- 
ability did not adequately apply to psycho- 
logical tests, did not find the reliability of 
their instrument. Miller,** however, in com- 
paring a battery of ten group intelligence 
tests, found-that the Kuhlmann—Anderson 
test, when applied to a number of 7th-grade 
classes, yielded a reliability coefficient of .92. 

This test was used in the present study be- 
cause of its wide use in educational experi- 
ments and because it is designed to test lim- 
ited rather than large ranges of grades. 

(2) Since it had been assumed that pupil 
changes might be aaeees by the reading 

% Sample applied to pupils will be 
found in Ap 


copies of all 
ix D. (See Origheal “Thesis "on file “at. the 
University Li , University of Wisconsn.) 


gs ay by the Educational Test Bureau, Inc., Minne- 
inn. 


% Earl Miller, “A Come Study of Ten Group Intel- 
ce Tests on the igh Schoo! —_— (Unpublished 
. thesis) University of Wisconsin, 


[Vol. 14, No.1 


ability of pupils, each pupil in this experiment 
was given the Traxler Silent Reading Test, 
Form 1, Grades 7 to 10.1 The three parts of 
the test were designed to measure (a) reading 
rate and story comprehension, (b) vocabulary, 
and (c) paragraph comprehension. 

Reading rate is measured by the number 
of words read in ten seconds, and story com- 
prehension is measured by ten multiple-choice 
questions. The basis for the measurement of 
vocabulary was Thorndike’s “The Teacher’s 
Word Book.” Paragraph comprehension is 
measured by having the pupil read six scaled 
paragraphs and attempting to answer a num- 
ber of questions on each of these paragraphs. 

The reliability of the total test score, 
obtained by intercorrelating duplicate forms 
of the test, is given as .o1 by Traxler. 

(3) Educators and psychologists have for 
some time been aware that pupils’ school per- 
formance is conditioned by their social envi- 
ronment and economic status. Both Free- 
man*”® and Burks* concluded that I.Q. can 
be altered by changes in the home environ- 
ment. Galton in his Hereditary Genius and 
Terman in his Mental and Physical Traits of 
a Thousand Gifted Children found that supe- 
rior children came more often from homes of 
above average social and economic levels than 
from homes below average in these factors. 

Several attempts to measure home environ- 
ment have yielded non-quantitative results.** 
The Sims Score Card for Socio-Economic 
Status, Form C* offers a quantitative measure 
of social and economic status. For this reason 
the Sims Score Card was used in this study. 

The preliminary form of this card was 
applied by Sims to 686 sixth- seventh- and 
eighth-grade pupils representing various social 
and economic levels. Bi-serial correlations 
were found for each item with each of the 
other items of the scale and correlations be- 
tween each item with the criterion scores, 
which consisted of the average score of all the 
remaining items In its final form this card 

The Traxler Stent Res Reading Test (Bloom 


Freeman others, “The of Environ- 
ment on the fntelience School Achievement, and Conduct 
of Foster Children”’, National Society for the Study of Edw- 
cation, BS owiy the Part 1, 1928, 


ental 
the je Siady of J, Education, 27th 


earbook, Part 1, Lg A 


a * be listed: J. H. at yy 
Whittier ton Beg ‘Grading radint Home Shel” Builetin No 
by tnd ittier State School 

. Verner M. Sims, The Measurement of Socio-Economic 
States ( Ill.: Public School Publishing Co., 
1928). 





PR LP SER PEE PARE SRR Eke 


September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


consists of 23 items having high correlations 
with the criterion but low intercorrelations 
with each other. 

Using a group of 100 paired siblings, Sims 
found this scale to have a reliability of .94.”* 
For a group of 72 paired siblings,** the ex- 
perimenter of the present study found a 
coefficient of reliability of .84 by use of Fur- 
fey’s intraclass correlational method.** 

According to the plan of this study, objec- 
tives other than information were to be 
measured. Results were sought for such goals 
as skill in forming judgments and desirable 
changes in attitudes. Very fortunately, several 
social study tests devised by Wrightstone un- 
der the auspices of the Progressive Education 
Association were available. Upon the basis 
of desired objectives submitted by a group of 
teachers engaged in teaching an experimental 
curriculum, Wrightstone had devised a bat- 
tery of five tests for the social studies field.*” 
From this battery, three tests, “Applying Gen- 
eralizations to Social Studies Events,” “Abil- 
ities to Organize Research Materials,” and 
“A Scale of Civic Beliefs,” were administered 


twice to the pupils in this experiment; the first. 


giving of these tests was prior to the teaching 
of the first three-week unit of work.** 

(4) Applying Generalizations to Social- 
Studies Events was constructed on the basis 
of a list of generalizations which a group of 
teachers expected their pupils to reach. This 
list was checked for validity against news- 
paper articles, textbooks, and reference mate- 
rials used in the social studies in classrooms. 
For this test, Wrightstone reports for grades 
X-XII, inclusive, a coefficient of reliability of 
.92 obtained by applying the Spearman—Brown 
prophecy formula to the correlation of scores 
on odd-even halves of the test.?° 

(5) Abilities to Organize Research Mate- 
rials, Revised Form, consists of five parts: 
Part I, Ability to Recognize a Suitable Topic 
for Research; Part II, Ability to Separate 
Irrelevant from Relevant Material; Part III, 
Ability to Sense Logicality; Part IV, Ability 

aa pp. 33. 


ae a number of siblings from ~ 

umished by Kir. R R. E. Gotham, Mr. G. E. Carlson 
—_ A. Madsen from a series of experiments which = 
ve 
‘sP. H. Furfey “A Formula for Correia’ Interchange- 
able Variables’’, Journal of Educational Psychology, XVIII 
(1927), pp. 122-124. 

Wayne Wrightstone, “Measuring Some Major Objec- 
tives of = Social Studies,” School Heuiew, XLII (1935), 
Pp. -_ 


Tne ee See Wk Ge SS ef Se. 2. W. 


13 


to Co-ordinate and Subordinate Appropriate 
Data; and Part V, Ability to Organize an 
Outline. The items for this test were de- 
rived from the instructional outcomes desired 
by the teachers, and from the type of activ- 
ities engaged in by pupils. For this test, 
Wrightstone reports for grades X—XII, inclu- 
sive, a reliability of .88 obtained by applying 
the Spearman—Brown prophecy formula to the 
correlation of odd-even test scores.*° 

(6) The Scale of Civic Beliefs is an attempt 
to measure civic attitudes and beliefs with 
regard to racial attitudes, international atti- 
tudes, national political attitudes, and national 
achievements. This test consists of 80 true- 
false items. Wrightstone checked these items 
for liberalism against editorials in such maga- 
zines as the Nation and the New Republic 
and also obtained opinions from a group of 
social scientists as to the liberalism or con- 
servatism associated with these items. Upon 
applying the Spearman—Brown prophecy 
formula to the correlation of odd-even scores, 
Wrightstone found this test to give a coeffi- 
cient of reliability of .94 for grades X—XII.™ 

For these tests, Wrightstone reported the 
following intercorrelations:*? 


1 
1 Applying Generalizations... 1.00 
2 Abilities to Organize Research _- 
3 Scale of Civic Beliefs 


The intercorrelations obtained for the same 
tests used as initial tests for the total pupil 
population of 375 used in this study were: 


1 
1 Applying Generalizations... 1.00 
2 Abilities to Organize Research-_-_- 
8 Scale of Civic Beliefs 


The small size of the intercorrelations given 
above indicate that this battery of tests 
measures, in spite of a small degree of over- 
lapping, different functions. 

A battery consisting of: 


(7) Hill, Test in Civic Attitudes 
(8) Hill, Test in Civic Information 
(9) Hill—-Wilson, Test in Civic Action® 
was given to each pupil prior to the teaching 
of the first three-week unit and following the 
completion of the second three-week unit of 
® Ibid., p. 776. 
" [bid., p. 776. 
® [bid., p. 778. 


% These tests are published by the Public School Publish- 
ing Co., Bloomington, Illinois. 





14 JOURNAL OF EXPERIMENTAL EDUCATION 


work. Each of these tests consists of 20 
multiple-choice items with each item per- 
mitting one correct answer from five possible 
choices. The authors of this battery report 
that the items used were selected upon the 
bases of adult experiences, courses of study, 
civics, text-books, opinionS of classroom 
teachers and subject experts, and upon try- 
out with junior and senior high-school pupils. 
No report on the reliability of these tests, 
however, is given by the authors. 

Analysis of the items of these three tests 
seemed to indicate that in spite of the titles 
assigned to two of these tests, information was 
measured to a large degree by all of them. 
The intercorrelations of these three tests when 
applied as initial tests for the population of 
375 pupils included in this study, gave the 
following results: 


1 3 


1. Hill, Civic Attitudes_-__-__-_- 1.00 . . 43 
2. Hill, Civic Information : . 50 
3. . Hill—-Wilson, Civie Action.____._._...____ 1.00 


That these intercorrelations are low is prob- 
ably due to the low reliability of the indi- 
vidual tests rather than to the differences in 
objectives measured. 

For measuring the outcomes from the 
teaching of the two three-week units, Safe- 
guarding Public Health and Community Plan- 
ning two objective tests were constructed. The 
Health test was applied to the pupils imme- 
diately before and after they had studied the 
first unit and the test on Community Plan- 
ning was similarly applied immediately before 
and after the teacher had taught the second 
of the three-week units. 

(10) The Health Test, Form A,;* consisted 
of 48 true-false items, 11 matching items, and 
24 multiple choice items with each of the 
latter items permitting of one correct answer 
from five possible choices. While it had been 
hoped to devise this test to measure objectives 
other than information, careful inspection in- 
dicates that this test measures primarily in- 
formational outcomes. 

(11) The Test on Community Planning, 
Form A, consisted of 48 true-false items, 14 
multiple-choice items each having five pos- 
sible choices, 14 matching questions, and 
several items designed to measure pupil 
action. It was hoped that this test would 


% The original form of the test devised by Mr. ©. ae 
and a class in educational measurement at the State 
ne Wisconsin, was modified for 


[Vol. 14, No.2 


measure objectives other than information by 
including items similar to the following: 


“Despite the fact that cities have street 
cleaning equipment, each individual in a 
city should feel responsible in helping to 
keep the streets clean. 

School pupils ought to have the right to 
throw candy wrappers where they please. 

Since most hospitals use coal, they 
should be located near industrial areas. 

Approximately how many discussions 
concerning Community Planning have you 
had with persons outside of your school 
during the past two weeks?” 


The test items used in these unit tests were 
carefully checked for validity with the text- 
books and reference materials most often used 
by the 8th-grade teachers in teaching health 
and community planning. 

These two unit tests with an intercorrela- 
tion of .53 as measures of initia] status are 
characteristic of the type of objective tests 
designed by teachers to measure pupil per- 
formance. 

For each of the pupil tests, except for the 
Kuhlmann—Anderson Intelligence Test, the 
Traxler Silent Reading Test, and the Sims 
Socio-Economic Score Card, it was possible to 
obtain three coefficients of reliability: (1) for 
the test as a measure of initial status; (2) for 
the test as a measure of final status; and 
(3) for the test as a measure of pupil change. 
The reliabilities of initial and final status were 
obtained by taking a random sample of 125 
cases from the total pupil group of 375 cases, 
dividing the initial and final tests into odd and 
even halves, correlating these halves for the 
proper test, and correcting the obtained coeffi- 
cients of reliability by use of the Spearman— 
Brown formula. 

The coefficients of reliability for these tests 
used as measures of pupil change were calcu- 
lated by subtracting, for each corresponding 
pair of initial and final pupil tests, the odd 
score on the initial test from the odd score 
on the corresponding final test. This gave, for 
each test so considered, a new series of odd 
changes and a new series of even changes 
which, when correlated and “stepped up” by 
the Spearman—Brown formula, gave the coeffi- 
cient of reliability of the test used as a 
measure of pupil change.** 


SA ae | of the tests i to pupils is given in 
Table V. A. G. Hellfritzsch valuable advice on the 
statistical aspects of this study. Assistance in scoring the 
tests and making elementary statistical analyses was sup- 
plied by the Works Progress Administration. 





Sept 


Hill 


Hill 


Hill. 


September, 1945] 


THE MEASUREMENT OF TEACHING ABILITY 


TABLE IV 
COEFFICIENTS OF RELIABILITY OF PUPIL TESTS 


(Raw Score) 
N=125 


Wrightstone—Abilities to Organize Research 
Materials 
a i ce 
Initial - 
Change - - : 
- Seale of Civic Beliefs 
Final _- ’ 
Initial _ - 
Change : 
W 8 cn a Applyi ing Generalizations to 
Social Studies Events 
Fir ig al . : 
Initial 
Change - - --- 
Hill—Test in Civie Attitudes 
Final _ - ; 
Initial 
Change 
ili—Test in Civic Information 
Final tae 
Initial 
Change 
Hill- Wilson— Test in Civie Action 


Initial _ : 
Change. di hied 


‘170 


. 571 
- 496 
. 319 


1. - 507 
1. . 425 
1 . 000 
. 507 


. 437 
. 148* 


. 87 " 1. 
. 86 , 1. 
. 93 ' 1. 


*These tests were weighted in ‘the final criterion in proportion to their reliabilities. The reliability 


of the final composite change score was .76. 


Table IV indicates the coefficients of reli- 
ability for the unit, Wrightstone, and Hill 
tests (expressed in raw score units) as 
measures of final status, initial status, and 
pupil change, obtained from a _ constant 
sample of 125 cases. The mean and standard 
deviation for each odd and even half used in 
obtaining these correlations are also indicated. 

The coefficients of reliability of tests used 
as measures of pupil change should be inter- 
preted in the same manner as are the coeffi- 
cients of reliability of the initial and final 
tests. That the reliability coefficients for tests 
used to measure pupil change are lower than 
those obtained from the initial and final pupil 
test status, is probably due to the fact that 
the reliability coefficients of pupil change con- 
tain the errors of measurement of both the 
initial and final applications of the tests used. 


TEACHER TESTS*® 


(1) The American Council on Education 
Psychological Examination for College Fresh- 
men, 1936 edition,** (by L. L. Thurstone and 
Thelma Gwin Thurstone) was applied to each 
teacher cooperating in the experiment. This 
test consists of five parts: (1) Completion 
made up of 40 sentences from each of which 
one word is missing. The subject is to supply 
the correct missing word wherever such an 
omission is indicated; (2) Arithmetic made 
up of 20 problems which are intended to cover 
the principal divisions of the field of arith- 
metic; (3) in the section on Artificial Lan- 
guage the subject is given a list of ten artifi- 


ad © She hee ee eh ae 
the teac will be found in Appendix E of Original Thesis 
on file in the University Library, University of Wisconsin. 
*" Published the American Council on Education, 
Washington, D. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE V . 
SUMMARY OF TESTS APPLIED TO PUPILS 


Test 


Kuhlmann-Anderson Intelligence Test— 
Grades VII-VIII 

Traxler Silent Reading Test—Grades VII-X 

Sims Score Card for Socio-Economic Status 

Health 

Community Planning 

—— To Organize Research 


Wrightstone—Scale of Civic Beliefs 
Wrightstone—Applying Generalizations To Social 


Someree Hvents..........-...-...------..----- 


Hill—Test in Civic Attitudes 
Hill—Test in Civic Information 


Hill-Wilson—Test in Civic Action__-_-__--.-------- 


cial words and rules for the formation of 
plural number, past and future tenses, nouns, 
adjectives, and adverbs, which are to be used 
to translate 30 sentences, some of which are 
translated from the English to the artificial 
language, while others are translated from the 
artificial language to the English; (4) Ana- 
logies consists of 29 sets of geometric figures. 
For each set, 2 geometric figures are given for 
which the subject is to perceive a relationship. 
For a given third figure the subject selects 
one of five possible choices that bear the same 
relationship to the third as that of the first 
two figures; and (5) Opposites consists of a 
list of 33 sets of four words in each set. Two 
of the words in each set have either opposite 
or the same meanings, and the task of the 
subject is to indicate which two words are 
opposite or the same in meaning. 

This test was selected because of its wide- 
spread use in educational research and be- 
cause of its high reliability (.93 — .98). 

(2) The participating teachers were also 
subjected to a second psychological test, The 
Teachers College Psychological Examination, 
1934 Edition,** which was devised by repre- 
sentatives of several midwestern teacher- 
training institutions. 

This examination consists of six parts: (1) 
for Vocabulary are listed 80 key words each 
of which is accompanied by five other words. 
The subject is to select from each group of 
five words that word which in meaning, is 
most nearly like the key word; (2) the test 
on Number Series consists of 25 rows of num- 
bers, each row being made up by some com- 
bination of numbers, such as 3, 6, 9, 12, 15, 


% Published by the State Teachers College, St. Cloud, 
Minnesota. 


Maximum 
Possible 
Score 


Approximate 
Working Time 
In Minutes 


Form 


18. For each row, the subject is to ascertain 
the combination used and then extend the 
given series by two more numbers; (3) the 
Same-—O p posite test consists of 25 rows of four 
words each. For each of these rows, the sub- 
ject is to indicate which two words are oppo- 
site in meaning or which two words are alike 
in meaning. (4) Arithmetic Reasoning con- 
sists of 15 problems in arithmetic which the 
subject is to solve; (5) the Completion Test 
is made up of 35 sentences from each of 
which one missing word is to be supplied by 
the subject; and, (6) Analogies consists of 35 
rows of words, geometric figures, or numbers. 
For each row, the subject is to perceive the 
relationship between two given signs and is 
to select from 5 possible choices that word, 
figure, or number which will bear a similar 
relationship to a given sign as the first two 
given conditions bore between themselves. 

On the basis of total raw scores, a reliability 
of .95 was reported. Since this test was stand- 
ardized on teacher college populations and 
since most of our teachers were graduates of 
such teacher training schools, it was thought 
advisable to include this test in the battery 
given to the teachers. 

(3) The American Council Civics and 
Government Test, Form B,** (by Robert D. 
Leigh and others), which is primarily a test 
of information in American civics and gov- 
ernment was also applied to each teacher. 
For this test a reliability of .82 for 126 cases 
is reported. 

(4) In order to obtain the personal judg- 
ment of the participating teachers on current 
social issues and problems, Parts I and III of 

® Published by World Book Co., Yonkers-on-Hudson, N. Y. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 17 


the test on Social Attitudes of Secondary 
School Teachers,* was applied. 

Part I, consisting of 106 true-false test 
items concerned with controversial issues call- 
ing for answers dependent upon judgment and 
opinion, was based upon concepts found pri- 
marily in “high grade journals of opinion.” 

Part III, Public alien Information Test, 
consisted of 100 true-false items designed to 
measure social science information and 
knowledge of recent or current national affairs. 
The key for this test was derived from judges’ 
opinions which represented liberal and con- 
servative tendencies on the issues covered. 

By use of the “split-half”’ method, reliabil- 
ity coefficients of .94 for Part I and .93 for 
Part III were obtained from a random sample 
representing approximately 3700 public sec- 
ondary school teachers in both urban and 
rural areas in al] but five of the 48 states. 

(5) A Scale for Measuring Attitude To- 
ward Teachers and the Teachig Profession 
(by Tressa C. Yeager) was administered to 
each cooperating teacher. This scale was con- 
structed by obtaining, from 198 high-school 
seniors, a list of 154 statements on attitudes 
towards teaching and opinions on teachers. 
A group of 301 persons, in various professions 
and occupations sorted this list of 154 state- 
ments into eleven piles which represented a 
range from the highest to the lowest apprecia- 
tive attitudes. On the basis of these sortings, 
scale values were assigned to each of the state- 
ments. The final scale was devised from the 
responses of 331 high school seniors to whom 
the list of 154 statements were submitted. 
Comparisons made between a group of seniors 
who had indicated a vocational preference for 
teaching with other groups which had indi- 
cated non-teaching vocational preferences in- 
dicated the performance of the teaching pref- 
erence group on this scale to be superior to 
that of the non-teaching preference groups. 

Splitting this scale in half, Yeager reported 
a “stepped-up” correlation of reliability of .88 
for 100 cases. 

(6) The Morris Trait Index L® (by Eliza- 
beth H. Morris) was designed to measure the 
trait of leadership as defined by Miss Morris. 


“ Described in “The Teacher and Society”, 
book of the Jokn Dewey Society Ld ie 
aay. 1937), Ch. VIII. Test used by 

Hartmann. 


Py oa ear- 


, C. Yeager, An Analysis of Certain Traits of 
Seleched High School Seniors Interested in Teaching, Con- 
tributions to Education, No. 660 (New York: Bureau of 
Hr nen Teachers College, Columbia University, 1935), 


c Published by Public School Publishing Co., Bloomington, 
inois. 


This test consists of five sections: (1) Sec- 
tion I is composed of 38 items, such as “study- 
ing”, “reading”, “having responsibility” and 
so on, for which the subject is to indicate for 
each item one of five degrees of feelings; 
(2) Section II consists of 14 comments often 
made by teachers to pupils, such as, “Merely 
satisfactory work isn’t enough.” For each of 
these statements, the subject is to indicate to 
which type of pupil (bright, dull, careless, 
lazy, bluffers, conscientious) the comment is 
appropriate; (3) Section III lists 15 different 
classroom situations, each of which is to be 
interpreted as being either amusing, embar- 
rassing, necessitating firm control, interesting, 
or necessitating correction of mistake; (4) 
Section IV is made up of 33 statements in- 
volving personal opinions each of which is 
answered by one response from a five point 
scale ranging from “always true” to “never 
true;” and (5) Section V consists of 7 
multiple choice items which present situa- 
tions for which the subject is to indicate his 
attitude. 

Data on the original test, of which the 
above described is an adaptation, was obtained 
from a group of 754 persons, which included 
teachers rated as strong or weak by their prin- 
cipals and a group of 402 students from a 
state teachers college, among which were 178 
seniors. The scoring on this test was devel- 
oped by comparing the responses of strong 
and weak teachers and also by ascertaining 
the degree of success in practice teaching of 
some of the teachers-in-training. 

The object of this index as stated by Morris 
was, “to develop a measure of other than in- 
tellectual qualities which contribute signifi- 
cantly to the success of prospective high 
school teachers.” * 

(7) The Orientation Test Concerning 
Fundamental Aims of Education, 1935 Revi- 
sion,** (by Alfred S. Leverenz and Harry C. 
Steinmetz) consists of 475 true-false items 
that measure the teacher’s knowledge and be- 
liefs in the seven areas of human experience 
corresponding to the seven cardinal objec- 
tives of education. The items are either true 
or doubtful. Among the latter items, however, 
are many that are accepted as true by persons 
having personal prejudices, provincial loyal- 
ties, or common superstitions based upon folk- 


“Elizabeth H. Morris, Personal Traits and Success in 
Tcaching, Contributions to Education, No. 342 (New York: 
Bureau of rh amen Teachers College, Columbia Univer- 
sity, 1929), p. 

# Published by the Southern California School Book De- 
pository, Ltd., Los Angeles, California. 





18 JOURNAL OF EXPERIMENTAL EDUCATION 


lore. According to the test manual, dogmatic 
and superstitious persons receive low scores 
while persons possessing a scientific outlook 
and an open mind receive high scores on this 
test. Many of the statements require consid- 
erable knowledge in the field, if even an open- 
minded person is to do more than guess 
whether the item is true or not; e.g., “Nitro- 
gen is necessary for plant growth;” “German 
silver is a mixture of silver and nickel;” or 
“The book ‘Mother India’ presents a true pic- 
ture of social life in India.” The total score 
used here is the average percentile rank of 
the nine subtests. For the nine subtests, the 
authors reported a coefficient of reliability of 
89 for 152 cases. 

(8) The Personality Inventory* (by 
Robert G. Bernreuter) was used in this 
experiment for the purpose of obtaining 
objective values for the traits measured by 
this inventory. 

In the construction of the Jnventory, Bern- 
reuter had recourse to four previously con- 
structed tests: The T/urstone Neurotic Inven- 
tory; the Laird Test for Introversion, Sched- 
ule C; The Allport Ascendance-Submission 
_ Test; and the Bernreuter Self-Sufficiency 
Test. The Inventory consisting of 125 items, 
each of which can be scored Yes, No, or ?, 
may be scored on four traits: neurotic tend- 
ency, (B,—N); self-sufficiency, (B,—S) ; intro- 
version-extraversion, (B,—/); and dominance- 
submission (B,—D). Intercorrelations between 
scores on these traits have demonstrated 
sufficient overlapping between B,—N and 
B,—I, so that usually scoring is confined to 
traits B,_N, B,—S, and B,-D.** 

Using Hotelling’s method of factor analysis, 
Flanagan has added two more scales to the 
scoring of this Inventory;*’ these scales Flan- 
agan calls self-confidence (F,—-C) and socia- 
bility (F,-S) but fails to define them. 
Flanagan has claimed that these two scales 
could for all practical purposes replace the 
four scales obtained by Bernreuter.** 

For the four scales, Bernreuter reports a 
range of reliabilities from .85 to .92. The reli- 


Published . 4 Stanford University Press, Stanford Uni- 
= California. 
G. Berareuter, 
Ps, This conclusion has also been substantiated by 
Stagner, “‘Valdity and = pe a of the ed Per- 
sonality Inventory,” Journal Abnormal and Social Psy- 
es XXVIII (Jan March, "1934, pp. 413-418; and by 
F. St. Clair, and J c. “Certain the 
RE Personality Inventory, »" Journal of Educational 
Psychology, XXvil (1937), pp. 530-540. 
# John Flanagan, ‘‘Factor Analysis in the Study of 
Persoaality * (1935), Seenkood University, California. 
# Ibid., p. 73. 


“Manual for the Personality Inven- 
Ross 


[Vol. 14, No.1 


ability for F,-C is given as .86; for F,-S 
as .78.*° Bernreuter, in the manual for the 
Inventory, reports the coefficients of validity, 
of his four scales with the criterion, made up 
of the four tests from which most of the items 
were drawn, as ranging from .84 to 1.00. Such 
high correlations can, of course, be expected 
since both correlates are highly saturated with 
identical elements. Lorge®® and Flanagan™ 
have indicated, however, that the validity of 
the Inventory was still an unsettled matter. 


(9) The Social Adjustment Inventory 
(Sapich Edition),°* (by J. N. Washburne) 
consisting of 123 items some of which call for 
more than one response, was designed to 
measure the traits of truthfulness, sympathy, 
alienation, purpose, impulse-judgment, con- 
trol, happiness, and wish. 


This Inventory was standardized from the 
scores of four groups: (a) public school chil- 
dren divided upon teachers’ and principal’s 
estimates of exceptionally good or poor ad- 
justments; (b) public school children divided 
upon bases of good or poor deportment marks; 
(c) feebleminded boys and girls having made 
favorable or unfavorable adjustments; and, 
(d) a group of prisoners. By comparing the 
responses of these groups it was ascertained 
that the best adjusted individuals made the 
highest adjustment scores on the Inventory, 
the next best adjusted group made the second 
best adjustment scores, etc. 


The reliability of the total adjustment score 
was found by Washburne to be .go; the inter- 
correlations between traits was found to be 
negligible and the validity of the Inventory, 
obtained by correlating scores obtained by 
prisoners and by well-adjusted individuals 
(bi-serials) was found to be .go.™* 


(10) The Stanford Educational Aptitudes 
Test,** (by Milton B. Jensen) is composed of 
three sections: (1) Position Preference Rat- 
ings; (2) Discipline Case Problems; and 
(3) High School Activities. 

, ©R. G. Bernreuter, “Manual for the Personality Inven- 
ory.’ 


Irving Lorge, ‘Personality Traits by Fiat II: 
rection,” Journal of Educational Psychology, XXVI (1993), 
pp —54 

John C. Flanagan, “Technical of M 
Tests,” Journal of Educational Psychology, 
pp. 641-51. 


syracie, N.Y by J. N. Washburne, 


ulti-Trait 
XXVI (1935), 


Syracuse University, 


sy. Washburne, Tesi Social Adjustment,”’ 

Journal a Applied ae xix. (1935), pp. 175-144, 

5% Published by Stanford University Press, Stanford Univer- 
sity, California. 





September, 1945) THE MEASUREMENT OF TEACHING ABILITY 19 


fraternity at Stanford University, and a list 
of city school superintendents, upon their 
abilities as administrators, teachers, and re- 
search workers. To the groups so rated, con- 
taining a total of 307 cases comprising the 
upper and lower 27% in each group, were 
addressed personal report blanks which were 
filled out and returned by 205 persons. On 
the basis of these replies, three scales were 
constructed to measure differences between 
teaching and research abilities, administrative 
and research abilities and teaching and admin- 
istrative abilities. The coefficients of reliability 
of the three scales were obtained from scores 
of persons not rated originally by the judges, 
and were used to obtain weights for the vari- 
ous test items while the validity of the scales 
was obtained by analyzing the responses made 
from the completed personal report blanks. 

For the three traits measured by this test, 
the author reported reliability coefficients of 
.85, .94 and .g1 for the 7—R, A—R, and T-—A 
abilities respectively.** 

(11) The Test of Teaching Problems* (by 

. L. Torgerson) consists of two parts: Part I 
is composed of 16 teaching problems for which 
the subject indicates, from a supplementary 
list of possible solutions, those procedures 
that would be used in correcting the pupil 
behavior involved; Part II consists of a list 
of common teaching processes or practices, 
such as, “Give the same assignment to all 
pupils,” and “Visit pupil’s home.” The degree 
to which the teachers follow these practices 
is indicated on a five point scale (from 
“always” to “never’’). 

This test was used in order to obtain some 
measure of the teaching practices employed 
by the co-operating teachers. 

(12) The Theory and Practice of Mental 
Hygiene* (by T. L. Torgerson) was designed 
to obtain diagnostic cues as to whether teach- 
ers could differentiate between causes and 
symptoms of pupil maladjustment, as to how 

i 2 , “Objective Differences Between Three 
Groups in Education (Teachers, Research Workers, and Ad- 
ministrators),” Genetic Psychology Monographs, Iil (1928), 


Pp. 335-454. 
teem of Directions for Stanford Educational Aptitudes 
es 


School of Education, University of Wisconsin, Madison, 
Wisconsin (mimeographed). 
"School of Education, University of Wisconsin, Madison, 


Wisconsin (mimeographed). 


teachers would go about correcting maladjust- 
ments if the causes were known, as to whether 
teachers could differentiate between behavior 
patterns as demonstrated by pupil action, and 
also what disciplinary procedures are used by 
teachers. 


This test is composed of four parts. In the 
first part the subject is given a list of 40 
statements such as, “Carelessness in school 
work,” and “unnecessary tardiness” for which 
the teacher indicates whether these statements 
are causes or symptoms of pupil maladjust- 
ment. In the second part, 47 common symp- 
toms of pupil maladjustment are listed for 
which the subject is to indicate the proper 
remedial procedure. Eleven specific pupil be- 
havior situations are listed in the third part. 
For each of these, the subject is to supply, 
from a list of 13 behavior patterns, that pat- 
tern which most closely conditions the specific 
pupil behavior. The fourth part consists of a 
list of 34 commonly employed disciplinary 
procedures from which the subject is to check 
the one most frequently used. 


The following tests: 


13. Abilities to Organize Research Mate- 
rial (by J. W. Wrightstone) 

14. Test on Community Planning, Form A 

15. Safeguarding Public Health, Form A 


which were described in the preceding section, 
were also taken by the participating teachers. 


RATING SCALES 


The teaching ability of the teachers in this 
experiment was rated by their principals or 
supervisors and by the experimenters on a 
battery of three rating scales. 

(1) Almy-Sorenson Rating Scale for 
Teachers® (by H. C. Almy and Herbert 
Sorenson) consisting of 20 traits was devised 
from a list of traits submitted by 77 persons 
engaged in educational work. Each of the 20 
traits is divided into 10 points. Some of the 
traits to be rated are those of resourcefulness, 
enthusiasm, leadership, co-operation, etc. The 
authors of this rating scale report a coefficient 
of reliability of .92 for two ratings by the 
same raters from 110 practice teachers. 

(2) The Michigan Educational Association 
Teacher Rating Scale® (by the Michigan 
Education Association) consists of 10 traits 
inn Pebiiched by the Public School Publishing Co., Bloom- 


® Published the Michigan Educational Association, 
Lansing Mich,” ”- 





20 JOURNAL OF EXPERIMENTAL EDUCATION 


each of which is divided into a number of 
sub-items for which ratings on a 5 point scale 
(“very inferior” to “very superior”) are pos- 
sible. The ratings are numerically expressed 
and the total of the assigned values is used to 
interpret the teaching skill of the teacher. 
(3) The Diagnostic Teacher Rating Scale 
of Instructional Activities™ (by T. L. Torger- 
son) consists of 16 traits each of which has 
five parts. Each of these parts consists of a 
statement describing the classroom activity of 
the teacher being rated. The observer checks 
those activities most closely describing the 
teacher’s performance and the sum of the 


is by the Public School Publishing Co., Blooming- 
ton, ° 


[Vol. 14, No.1 


values assigned to these statements is used to 
ascertain the teaching ability of the teacher 
rated. The coefficients of reliability obtained 
from two ratings by the same judges of two 
groups of teachers are given by Torgerson as 
86 and .89. 


Barr, Torgerson et al** found that each of 
these scales when applied twice by the same 
superintendents to the same teacher gave the 
following coefficients of reliability: 


Almy-Sorenson Rating Scale_._........____.. 92 
Michigan Teacher Rating Card 
Torgerson Diagnostic Teacher Rating Scale__. 


© Barr, Torgerson, et al., The Measurement of Teaching 
Efficiency, (New York: The MacMillan Co., 1935), p. 87. 


TABLE Via 


COEFFICIENTS OF RELIABILITY OF TEACHER TESTS 
N=28 


Test 

American Council Psychol 
Teachers College Psychological Examination ___- 
American Council Civics and Government Test. 


ical Examination_____- 


(Participating Group) 


Social Attitudes of Secondary School Teachers...._____._____._____-____- 


Yeager—Scale For Measuring Attitudes.______- fe 


Morris Trait Index—L 
Lewerenz-Steinmetz—Orientation Test__- 
Bernreuter—Personality Inventory 


Health—Unit I 
Community Planning—Unit II 


Wrightstone—Abilities to Organize Research_-_-_-_--- 


87 


*Editor’s Note: The exceedingly low reliabilities here reported, compared with those in T able Vib 
would lead one to suspect some sort of error in administering or scoring these tests. 


Name ¢ 


Americ 
Teache 
Americ 
Social 

Healtt 
Test o 
we 


Yeage 
7 
I 


Torge 
I 


Torge 


The ] 


TABLE VIb 
COEFFICIENTS OF RELIABILITY OF TEACHER TESTS 
(Combined Groups) 
Test 


American Council Psychological Examination._...._..___________________________. 
Teacher College Psychological Examination__.._..__________________________.____. 
American Council Civics and Government Test__.._.._....________________________. 
Social Attitudes of Secondary School Teachers 

Yeager—Scale for Measuring Attitude 

Morris Trait Index—L : 
Lewerenz-Steinmetz—Orientation Test._..._._._.__.____________- ace 
en Inventory 


Morr 


Was! 





September,1945| THE MEASUREMENT OF TEACHING ABILITY 21 


The coefficients of reliability for most of the 
tests applied to the 28 participating teachers 
are presented in Table VIa. These coefficients 
were obtained by correlating the split halves 
of the test results and stepping up the 
obtained correlations by the Spearman—Brown 


Test results for an additional group of 
teachers, similar in training, experience, age, 
grades taught etc., to the 28 participating 
teachers, were also available. This group had 
taken, with several exceptions, the same bat- 
tery of tests which had been administered to 
the participating teachers. Since the coeffi- 


prophecy formula. 


TABLE VII 
TESTS AND RATING SCALES APPLIED TO TEACHERS 
(Teacher Tests) 


Name of Test or Rating Scale Publisher 


American Council Psychological Examination American Council on Education 


Washington, D 


State Teachers College 
St. Cloud, Minnesota 


Teachers College Psychological Examination 


American Council Civics and Government Test World Book oo egy A ¥ 


Yonkers-on-Hudson, 


John Dewey Society 
New York, New York 


University of Wisconsin 
adison, Wisconsin 


Social Attitudes of Secondary-School Teachers 
Health Test—Unit I 


Test on Community Planning University of Wisconsin 


adison, Wisconsin 


Wrightstone—Abilities to Organize 


Teachers College 
Research Material 


Columbia University 
New York, New York 


Teachers College 
Coluntbia University 
New York, New York 


Torgerson—Theory and Practice of University of Wisconsin 
Mental Hygiene Madison, Wisconsin 


Yeager—Scale For as Attitude 
Towards Teachers and the Teaching 
Profession 


Torgerson—Teaching Problems (Mimeo.) University of Wisconsin 


Madison, Wisconsin 


Stanford University Press 
Stanford University, California 


Public School Publishing Company 
Bloomington, Illinois 


Syracuse University 
Syracuse, New York 


Stanford University Press 
Stanford University, California 


Southern California School Book 
Depository, Los Angeles, California 
Public School Publishing Company 
Bloomington, Illinois 


Public School Publishing Company 
Bloomington, Illinois 


Michigan Education Association, 
Lansing, Michigan 


The Bernreuter Personality Inventory 
Morris Trait Index—L 


Washburne Social Adjustment Inventory Sapich 
Edition 
Stanford Educational Aptitudes Test 


Lewerenz-Steinmetz—Orientation Test 
Almy-Sorenson—Rating Scale for Teachers 
Torgerson Diagnostic Rating Scale of 


Instructional Activities 
Michigan Teacher Rating Scale 





22 ' JOURNAL OF EXPERIMENTAL EDUCATION 


cients of reliability for a larger sample should 
be more valid the test results of both groups 
were combined in order to obtain new coeffi- 
cients of correlation. The coefficients, ob- 
tained in the manner described above, are 
listed in Table VIb. 


Examination of Tables VIa and VIb show 
that in most instances the coefficients of reli- 
ability calculated from the combined groups 
are larger in magnitude than those calculated 
from the group of participating teachers. 

A list of the tests and rating scales applied 
to the teachers is given in Table VII. 


SECTION III 


ESTABLISHMENT OF SEVERAL CRI- 
TERIA OF TEACHING ABILITY 


The first step, in building a criterion score 
of teaching ability for each teacher, consisted 
in calculating the mean and standard devia- 
tion® of the final, initial, and change scores 
of the pupils in each class for each of the 
eight pupil tests. 

Each set of 28 change score means derived 
for the pupils of the 28 classes represents the 
extent to which the goals of education as 
measured by that test were attained during 
the interval between the initial and final giv- 
ing of the test. Each set of 28 means can be 
used as a set of indices of teaching ability. 
However, in view of the fact that these eight 
separate sets of scores are probably not unre- 
lated and since the reliability of some of these 
tests taken singly is rather low, it was thought 
desirable to combine these eight measures into 
fewer and more reliable composites. 


In order to decide in what manner these 
eight measures of pupil change should be com- 
bined, it was essential, first of all, to know the 
extent to which the goals of education they 
measure are related to each other. Some in- 
formation concerning this relationship was 
found in the size of the intercorrelations of 
the eight final tests with each other, the eight 
initial tests with each other, and the eight 
change scores with each other. These inter- 
correlations are given in Tables VIII, IX, 
and X. 

Because many of these intercorrelations 
appear to be quite low, it is well to consider 


* Data for the additional group were used with the per- 
mission of Mr. Gotham. 

% Throughout this study, standard deviations when ex- 
pressed for classes were obtained by using VN — 1 in the 
denominators. 


[Vol. 14, No.2 


how large they must be in order to be sig- 
nificantly different from zero. Since correla- 
tions based on random samples of 375 pairs 
of values from an uncorrelated population are 
normally distributed about zero with a stan- 
dard deviation equal to | ——— which 
equals .0517, correlations in these tables 
larger than .10 indicate the presence of a real 
relationship with a considerable degree of 
confidence. 


An inspection of these three tables of inter- 
correlations seems to show that there is a 
greater degree of association between either 
the initial or fina] status of these pupils, rela- 
tive to the goals measured by these tests, than 
there is between the progress the pupils made 
in the direction of the several goals during the 
course of this study. Since the size of inter- 
correlations between two measures depends, 
amongst other things, not only upon the de- 
gree to which the educational functions 
measured overlap, but also upon the reliabil- 
ity of each measure, the intercorrelations be- 
tween the change scores in Table X would 
probably be considerably higher if the change 
measures had been more reliable. 


The intercorrelations recorded in Tables 
VIII, IX, and X show, then, that although 
the measures there listed have something in 
common, there is no great degree of over- 
lapping between any particular combination 
of the eight tests. In other words, these inter- 
correlations by themselves do not reveal any 
clear-cut manner in which these tests may be 
composited. Consequently, in order to arrive 
at a meaningful way of combining these tests, 
it becomes necessary to supplement these 
intercorrelations with other considerations. 


It will be recalled that the Wrightstone and 
Hill tests were applied at the beginning and 
end of a six-month teaching period, whereas 
the Unit tests were applied at the beginning 
and end of three-week teaching periods. The 
former, then, measure long-time changes; the 
latter, short-time changes. Furthermore, since 
the unit tests, as described in Section II, 
measure more specific outcomes of learning 
than do the other tests, it would appear logical 
to combine these two tests to furnish a single 
measure of teaching ability as manifested by 
specific short-time pupil changes. 


An examination of the three Wrightstone 
and three Hill tests reveals that the Wright- 


isaac, GIT 





~ 
& 
| 
~ 
& 
x 
S 
4 
5 
5 
NN 
~ 
=) 
: 
: 
> 
3 
= 
R) 
. 
- 
RS 
~ 
~ 
3 
& 
) 
> 
—~y 


uoray 
tH 


Uol}BULIOJUT 
tH 


uolneUlojuy 
tH 


0g 
ge" 


1?" 
13° 
oP ° 
6° 


1g ° 
Ty 
a 
Iv ° 


sepnziy  suorzezi;e1eUer 


tH 


Lg ° 
8¢° 


De 
Lg" 
ev" 
gs° 


9u04s} YS MA 


SJ} ALD 
ou0qs} 4st 


99° 


LE° €g° 
UIT Han) 
SOITIy sujuue|d 
eu0szqszu  AyunuWOD 


"OZ = N 8104 SU01}B]e1100-JJa8 10} yda0xe ‘Gyg = N 


(e100g MEY) 


Sax00g LISA], TVILIN] TidNgd NAIMLAG SNOLLVTIAMOOMALNI 


sopniiiiy suonezitesouer 


tH 


auozsz ys 


XI Fav 


Sate AID 
@U0szySL MA 


£6" 
Sa" 
Tg° 
(TI 34) 


SOTA V uyuUBid 
au0ys}ysiM 4 AuNnuTWI0D 


"eZ = N 204A SU0I}e[91100-J[a8 10] ydeoxe ‘¢)E= N 


(a100g Mey) 


Sau00g LSA], TVNIg lldNg NSIMIAG SNOLLVTAWMOOUALNI 


IIIA Fav, 


89° 


(1 14Q) 
yyeeH 


4uorp~y [ITH 
uor;eulosUy [ITH 
7 89pNIt4V ITH 
~""“SUOTZBZITBlEeUes) 
SU0jsP YS M 
SJolfeq PAT) 
auoys}ysL AM 
seg y 

eu0js;y SLA 

“(II wy) Suluuelg 
qrunuIWOD 

“"""(1 94) yayeeH 


UO [TH 


"===" wor BUuo;Uy [ITH 


(1 1p) 
yreeH 


~~~ “80Ppnji4yyV TH 

~ >" "SUO1}BZI]BIeUer) 
au04s} ysl 

~*~" SST DAID 
9U0jS}P USM 
sorstTiqy 
au0ys}YSLIM 

(IT 3u ) Zuruuelg 
qunuw0g 

“(1 4) yaeeH 





{Vol. 14, No.1 


a 
: 
S 
=) 
a 
< 
& 
S 
a 
s 
te, 
So 
il 
= 
= 
ge 
» 
iS 


LY" gg ° 
ee 62" 
sh 3 Le" €¢° +h 1g ° oo" 
aysoduiog aysoduiog eysodwog aszsodwog aysodwog aysodwog azyisodwog aysodwo0g aesodui0g 
tH auozszy sil wuy) 1H au0qsz4 31 MA Way) tH oU03s}Y SLIM way) 
asueyD [erytuy [eal 


SLE=N 
SANOOS MVY TidNg 40 SNOILVIGUMOOUALNI dO SOVAGAY GALHOIGMND 
IX TIavL 


ze" : 
ST° LL" 

ve. 80° 

90° 93° : 08° 

00° — 0° Lv 


61° 10'— ai" £0 "— yr" Or" 

( I wu) ) 
uoKeULIOJUT = «= SEPNINAY suoNnLszeieuer soe HAD —soTITHG uyuur (1 up) 
tH WH eu0ys}ysUIM dU07874SIIYA au0ys}ysuM  ApUuNUWOD yq[eeH 


"SZI=N 104M SU0I}BI91100-Jjas 10} 3daoxe ‘GJg=N 
(a109g M837) 


SAx00§ FODNVHD TidNd NAIMLAG SNOILVTIXAOOMRLNI 
X FavL 


aqs0d wo) [tH 


a : Lg" 98° “T= ==" ""* ayrsodulod eu0zsz4SiI AA 


“ayisoduiog yup 


~~“ UorYy IH 
uoryeULIO;UT [TH 
~--"s9pnj7V [ITH 
~""“SUOT}BZT]B1EUar) 
9U048} 43M 
Syoreg ATID 
aus} 4S 


0 SsertTIqy 


ou038) 4311 

"(I aug) Sura 
qrunuuw0g 

---"(7 IU) ysTeeH 





September, 1945) THE MEASUREMENT OF TEACHING ABILITY 25 


stone tests emphasize the non-informational 
or “intangible” objectives, whereas the Hill 
tests, although labelled differently, seem to 
measure little more than information. Conse- 
quently, the three Hill tests may be combined 
to furnish a single measure of long-time infor- 
mational change. The remaining three Wright- 
stone tests, which emphasize non-informa- 
tional objectives, represent in combination a 
measure of the kind of objectives that modern 
educational thinking has especially assigned 
to the social studies. The three Wrightstone 
tests may then also be combined to furnish a 
measure of long-time, non-informational 
change. 

A crude, but nevertheless, informative pic- 
ture of how these three sets of tests are related 
to each other may be obtained by calculating 
from Tables VIII, IX, and X, the average 
intercorrelations of the tests comprising the 
Unit set, the tests comprising the Wrightstone 
set, and the tests comprising the Hill set 
within each set and between sets. From Table 
XI, which lists these average correlations, we 
find that the average intercorrelation between 
the Unit set and Wrightstone set for final 
scores is .31. This value was obtained by aver- 
aging the six correlations in Table VIII repre- 
senting the intercorrelations of the two Unit 
tests with the three Wrightstone tests. The 
other entries in Table XI were similarly 
obtained. 

Inspection of Table XI shows that the cor- 
relation of the two Unit tests with each other 
is higher than their average correlation with 
either the three Wrightstone or three Hill tests 
on either the final, initial, or change measures. 
Similarly, the average intercorrelation of the 
three Hill tests with each other is higher than 
their average intercorrelations with the two 
Unit or three Wrightstone tests as final, initial, 
or change measures. The average correlation 
of the three Wrightstone tests as measures of 
change with each other is higher than the 
average intercorrelations with the unit and 
Hill tests as measures of change. Although 
this is not true for the Wrightstone tests as 
measures of final and initial status, the aver- 
age intercorrelations of the Wrightstone tests 
with each other in these two cases are essen- 
tially as great as their average intercorrela- 
tions with the Unit or Hill tests. It appears, 
therefore, that the manner of composition sug- 
gested by the considerations stated in the last 
paragraph leads to three sets of tests with the 
tests in each set bearing more resemblance to 


each other than they do to the contents of the 
other sets. It would then appear that these 
three composites are meaningful from the 
point of view of the kinds of objectives 
measured and the manner in which they are 
interrelated. 

Having settled upon the manner in which 
the tests are to be composited, the question 
arises as to what relative weights shall be 
given to the tests in each set. One could get 
a composite measure for each set of tests by 
simply adding together the raw scores each 
pupil made on the tests in each set. This 
would result, however, in assigning relative 
weights to the several tests in a set in direct 
proportion to their standard deviations. 
Since tests with many items tend to have 
larger standard deviations than do shorter 
tests, this procedure would result in weighting 
tests in proportion to their length regardless 
of other considerations. 

Neither wishing to do the above nor know- 
ing of any reason why one test in a set should 
be given more weight than any other test in 
the same set, it was thought advisable to 
weight the tests within each set equally. Such 
a composite can be obtained by dividing the 
scores a pupil makes on the several tests of a 
given set by the respective standard deviations 
of the distribution of scores for those tests and 
adding the resulting quotients to get a single 
composite score for each pupil. 

In getting a composite of three raw initial 
test scores, one would ordinarily divide X — 
M by the standard deviations of the distri- 
butions of the initia] test scores and for the 
composite of the final raw scores, X — M by 
the -standard deviations of the distributions 
of final raw scores would be used. The stand- 
ard deviations of the eight tests as initial tests 
are not greatly different from their standard 
deviations as final measures.*** Since these 
differences are small and since the use of a 
uniform standard deviation for obtaining the 
final, initial, and change standard scores of a 
single test furnishes a convenient check on 
arithmetical calculations, it was decided to 
use a single estimate of variability for each 
test. This single standard deviation was ob- 


© The three composites s. _— be o— - to as 
the Unit. a htstone, and Hill composites, will be 
designa . we and H eiaedin an 

it awe here sume_ cou sanded score a cme 
path, t Taw scores are proportional to 

ir standard deviations. T. L. Kelley, “Statistical Method,” 

(New York: The MacMillan Co., 1923), Section 34, pp. 

= Table A, (Appendix A), Original Thesis on file, Library, 
University of Wisconsin. 





26 JOURNAL OF EXPERIMENTAL EDUCATION 


tained by pooling both the sum of squares of 
deviations around the means and degrees of 
freedom for the final and initial distributions 
of scores for each test and calculating from 
these pooled quantities, a single standard 
deviation.” Table XII lists the standard 
deviation of final and initial scores for each 
test as well as the pooled standard deviations 
as outlined above. 


Using these pooled standard deviations, the 
final, initial, and change raw scores for each 
pupil were combined to get final, initial, and 
change composite scores for the Unit, Wright- 


Unit Composite Final Score 

Unit Composite Initial Score 

Unit Composite Change Score______- : 
Wrightstone Composite Final Score__- 
Wrightstone Composite Initial Score __. 
Wrightstone Composite Change Score 
Hill Composite Final Score 

Hill Composite Initial Score 


Hill Composite Change Score 


stone, and Hill sets of tests. The following 
example illustrates the manner in which the 
various composite scores were calculated for 
each pupil: 

Let us assume the following table to repre- 
sent the raw scores of a particular pupil: 


® The following illustrates the method whereby the pooled 
standard deviation was obtained for the Health Test (Unit I). 


Variance 
=(X — M)? 
=(X—M)? Total D. F. S. D. 


59.48 7.712 


[Vol. 14, No.1 


Final Initial Change 


Comm. Plan..._.--.--- 
Wright. Abil.._______- 
Wright. Beliefs 


The following nine composite scores were 
calculated by dividing the raw scores by the 
corresponding pooled standard deviations, as 
given in Table XII, and then summing as 
follows: 


67 84 
— os a 
7.712 8.414 

44 67 


7.712 8.414 
23 17 
7.712 8.414 
110 129 
21.03 18.75 
97 81 
"21.03 13.75 
13 48 
21.08 = 18.75 
19 16 
2.715 = 3.128 
12 8 
~ 2.715 3.128 
7 8 














2.715 3. 128 


The three sets of 28 class means of the 
Unit composite change scores, Wrightstone 
composite change scores, and Hill composite 
change scores, represent the average progress 
made by the pupils of the various classes to- 
wards the composite educational goals as 
measured by the tests applied and therefore 
represent three criteria of teaching ability. 


It will be recalled that one of the reasons 
for making these composites was to increase 
the reliability of the criterion of teaching 
ability. The reliabilities of the final and initial 


seoeor w UD Bb 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE XII 


COMPARISON OF FINAL, INITIAL, AND POOLED STANDARD DEVIATIONS FOR EACH 
OF THE PuPIL TESTS 


Test 

Health (Unit I) 

Community Planning (Unit II) 
Wrightstone Abilities 

Wrightstone Beliefs 

Wrightstone Generalization 
ee wine 
Hill—Information 

Hill—-Action 


composites as well as the change composites 
were calculated from the reliabilities and in- 
tercorrelations of the raw measures according 
to a formula given by Kelley. (Table XIII) 


TABLE XIII 


RELIABILITIES OF COMPOSITES OF PUPIL 
SCORES 


Final Initial Change 
Unit Composites . 78 .37 
Wrightstone 

Composites - _ - 5 a . 93 . 75 


Hill Composites ca. omen ae . 44 


A comparison of these reliabilities with the 
reliabilities of the raw measures, as listed in 
Table IV, reveals that the reliabilities of the 
composite measures are in most cases sub- 
stantially higher than those for the original 
single measures. In light of the above, it was 
thought advisable to consider the possibility 
of further combining these three composite 
criteria into a single composite. It is true that 
each set of tests measures an important area 
of the total outcome of the educational goals 
outlined in this study but for purposes of this 
investigation it appeared meaningful to obtain 
also a single measure of the extent to which 
the pupils approached these goals. 

One could arrive at such a single composite 
by assigning equal weights to each of the 
three composites already obtained. Since the 
reliabilities of two of these composites are 
rather low, it was decided, however, to weight 
each composite in such a manner that the 
reliability of the resultant single composite be 
a maximum. 

Since, as will be subsequently seen, it is 
necessary to make allowances in the mean 
pupil gains for the variability of the mean 
pretest scores of the various classes,® it is 


@Truman L. Kelley, “Statistical Methods,” (New York: 
Macmillan Co., 1923), p. 197, (Formula 147). 

® At a later stage in the development of the criterion of 

teaching ability, it becomes necessary to equate pupil groups 

on those factors which condition the extent to whi they 

, one of the factors is initial status, it is desirable 

that the measure of initial status be as reliable as possible. 


Initial 
Test 
8. D. 


7. 05 
8.24 
20. 28 
13. 44 
13.27 
2. 66 
3.02 
2. 88 


desirable to maximize not only the reliability 
of the UWH™ composite gain, but also the 
UWH composite of initial status. 


The problem of making a single composite 
of the three composites already arrived at then 
resolves itself into discovering that set of 
weights, W,, W., and W,, for the U, W, and H 
composites respectively, which will simultane- 
ously maximize the reliabilities of the UWH 
composite as both a measure of initial status 
and of change. 


To solve this problem, separate sets of 
optimum weights were first obtained for the 
UWH composite of initial status and UWH 
composite gain as follows:*®* 

If we let X,, X,, X,, X,, X;, X,, X;, and 
X, represent the raw socres on the Health, 
Community Planning, Ability to Organize 
Research Materials, Scale of Civic Beliefs, 
Generalizations, Hill Civic Attitudes, Hill 
Civic Information, and Hill Civic Action tests 
respectively, and S.D.,, S.D.,, S.D.,, S.D.., 
S.D.;, S.D.,, 5.D.,, and S.D.,, the correspond- 
ing standard deviations, the weighted sum 
whose reliability we wish to maximize be- 
comes: 

X X X 
= 3p. +"3p,+™5p,+™ 


Xx, X, X, _X, 
3D. + 3p,+™ 3D.+ “3D, 





xX, 
Ww ——_—_—_ 
+“s3D., 
where w, — weight of tests in Unit composite 
w, == weight of tests in Wrightstone 
composite 
w, == weight of tests in Hill composite 


7 The UWH composite denotes the single measure into which 
U, W, and H composites will be combined. 

% The solution to follow which is an adaptation of Kelley’s 
general formulae for the correlation between weighted aver- 
ages, was suggested by Hellfritzsch. 





28 JOURNAL OF EXPERIMENTAL EDUCATION 


Letting the raw scores that might be ob- 
tained upon a second administration of this 
battery of tests be denoted by primes, the 
weighted sum of scores of a second application 
would become: 

mz x’, ¥, 
‘3D, +! 3p,+™ sD, 


5, = w 





» X’, P if 
+ 3p. tsp tsp, 
x’, x’, 
typ t+ SD, 
The problem of maximizing the reliability of 
this weighted sum is identical] with maximiz- 
ing the correlation between S, and S,, i.e., 


Ts = 25182 
eT We, can) 


Kelley” has given a formula for calculating 
Ys,s2 if the intercorrelations of the type Tx:xy, 
x,x,, and rx’,x, (where i + j) as well as the 
weights are known. In our case, 7x,x,,’s are 
the reliability coefficients and the rx,x, 8, = 
fx’;x, and are the correlations of the eight 


tests with each other and have already been 
listed in Tables IX and X. 


Since we are interested only in the relative 
weights, we can let w, = 1, so that only w, 
and w, need to be determined. To get the 
values of w, and w,, which will make 75,5, a 
maximum, we need therefore to obtain the 
simultaneous solution of two equations in 
w, and w, which result when we set the par- 
tial derivatives of rs,s,, with respect to w, and 
w, each equal to zero, that is, 


C) T1820 
Ow, 
C) 3182 
Ow, 


™ Truman Kelley, “Statistical Method,” (New York: Mac- 
millan Co., 1923), p. 198, (Formula 148). 





ened f(w,, w,) =o 
— F(w,, Ww.) =o 
c Unit , —_— Hill 
omposite ‘om posite Composite 
Wi W Ws 
% 6 1 


2 


[Vol. 14, No.1 


The two equations f(w,,w,) =o and 
F(w,, w,) == 0 turn out to be cubic equations 
and were graphically solved. 

The relative weights of w,, w,, and w,, 
which give the greatest reliability for the 
UWH composite of initial status turn out 
to be: 

W, == 3 
W,=7 
W,=1 


to the nearest integer. These weights, when 
substituted in Kelley’s formula, would give a 
UWH composite of initial status with a reli- 
ability of .94. 

The corresponding weights for maximizing 
the reliability of the UWH composite change 
are: 

Y= Va 
w, == 6 
w,=—I 


to the nearest one-half. Using these weights, 
the reliability of the UWH composite as a 
measure of change would be .76. 

Since it was desired to use the same set of 
weights for both the initial and change com- 
posites, that set of weights was found (by 
trial and error) by exhausting all of the com- 
binations of integral weights such as (2, 8, 1), 
(1, 5, 1), etc., in the neighborhood of the 
above weights which resulted in initial and 
change reliabilities whose sum of deviations 
from the above reliabilities was a minimum. 
This set of weights turned out to be: 

w=! 
vw, =7 
w,—I 


and is the set which was used in combining 
the Unit, Wrightstone, and Hill composites 
into a single measure of both initial status 
and change. 

The values of the initial status and change 
reliabilities corresponding to each of the above 
three sets of weights are as follows: 


Reliabilities of U W H Composites 


as measure 0 
Initial Change 
Status 
. 936 ‘ This set of weights 
maximizes change 
reliabilities. 
. 943 ‘ This set of weights 
maximizes initial 


Comments 


Status reliabilities. 
. 941 7 This set of weights 
was used. 


Septem 


Fro1 
set of 
the m 
either 
leads 
identi 
round 

Thi 
sentec 
H co 
have 





September,1945| THE MEASUREMENT OF TEACHING ABILITY 29 


From the above it is seen that although the 
set of weights actually used does not furnish 
the mathematically maximal reliabilities in 
either the case of initial status or change, it 
leads to reliabilities that are practically 
identical with the maximal values if all are 
rounded to two decimal places. 

The mean changes for each class as repre- 
sented by the U composite, W composite, 
H composite, and UWH composite, which 
have thus far been discussed, have all been 
built up from the differences between the raw 
final and initial status scores the pupils made 
on the eight tests (two Unit tests, three 
Wrightstone tests, and three Hill tests). 

Below are listed the correlation coefficients 
which were calculated, for the total group of 
375 pupils, between initial status (#) and 
change (c) as measured by the single tests 
and the U, W, and H composites:*™ r;. for 
U, = —.18; U, = —.44; W, 

: —.43; W, = —.54; H, 

H, —.44; H, = —.49; VU. 
W, = —.46; and H, == —.56. 

These coefficients are all negative and indi- 
cate the tendency for pupils with high initial 
status scores to gain less than pupils with low 
initial status scores. This tendency may be 
due to the fact that a test of finite length 
offers less opportunity for a pupil with a high 
initial test score to gain than it does for one 
with a low initial test score, or to the practice 
of many teachers of concentrating their teach- 
ing efforts upon the children in the class who 
are at the lower achievement levels, or to 
some other factors. Be that as it may, it re- 


Lonel 
Goin 








0 4 4. " i 4. -“ 4 
wo u a uw “ is a6 a7 ae iw ae a a a 
Segreceten Surve of Average Sots = Pre-test Seove fer 


Get Compete (Ren) 


"a Editor’s note: The reader may be interested in examining 
the data on this point in subsequent studies of this report. 


mains that a class of pupils having a low 
initial test average would have a better chance 
of making larger gains than one having a 
higher initial test average. An index of teach- 
ing ability based upon average pupil change 
would then tend to favor the teacher whose 
class began at a lower level of achievement. 

Since for the purpose of this study it is 
necessary to derive an index of teaching abil- 
ity which is free from such tendencies, it was 
thought desirable to adjust the raw gains 
made by each pupil in such a manner that 
the pupil’s chances of gaining a given amount 
as measured by the adjusted gains would be 
independent of the size of the particular initial 
status scores.”*» 


PLATE 2 





~ ue Ue le 
Segression Ouree of Average Gain se Pre Lert Seere for 
for Srigpornese Composite (Mew 





\. 








4 oa z 
T..SR  uwuen Ow SS eS we. Sf 
Regression Curve of Average Gals on Pre test Score for Hl] Composite (Rew) 


"> The notation is as follows: 
U, Health Test (Unit 1) 
U, Comm. Planning (Unit IT) 
W, Wright. Abilities 
W, Wright. C. Beliefs 
W, Wright. Generalizations 
H, Hill Attitudes 
H, Hill Information 
H, Hill Action 
U, Unit composite 
W., Wright. composite 
H, Hill composite 
r,. Correlation between initial status and change 





30 JOURNAL OF EXPERIMENTAL EDUCATION 


Plates I, II, and III show the regression 
curves of average change on initial test scores 
for the U, W, and H composite measures. 
These curves were drawn in such a manner 
that they gave a smooth curve fitted to the 
indicated points, as well as a free-hand curve 
might be. The indicated points represent the 
average gains made by from 35 to 95 pupils 
whose initial test scores lay within the par- 
ticular initial test score intervals correspond- 
ing to the points. These curves reflect the 
same tendency of pupils with high initial test 
scores to gain less and vice versa, which was 
revealed by the negative coefficients listed on 
the preceding page. 

An adjustment of individual pupil change 
was then based upon the assumption that the 
average raw changes made by groups of pupils 
having various average initial test scores are 
educationally equivalent. The adjustment 
made of each pupil’s U, W, or H composite 
changes was to divide the pupil’s observed raw 
change by the average raw change (read from 
the curve) made by the pupils having the 
same initial test score. Thus all pupils whose 
combination of actual change and initial test 
score, which when plotted upon the graph, fell 
exactly on the curve of regression would have 


an adjusted change score equal to unity. 
Those points falling below the curve would 
correspond to a value something less than one, 
and those above, something greater than one. 


[Vol. 14, No.1 


Tables XIV, a, b, and c, list the divisors 
(taken graphically from the curve) for the 
various initial test intervals which were used 
in adjusting the raw changes as measured by 
the U, W, and H composite measures. 


TABLE XIVa 


TABLE FOR CONVERTING UNIT COMPOSITE RAW 
CHANGE Scores INTO UNIT COMPOSITE 
ADJUSTED CHANGE SCORES 


If Initial Raw 
Composite Score Composite 
Falls Between Change By 
9.00—11.29_____- 2. 

11. 30—12. cae 

12. 80—14. 
14. 083—14. 
15. 00—15. 
15. 90—16. 
16. 58—17. 
17. 20—17. 
17. 830—18. 
18. 30—18. 
18. 79—19. 
19. 24—19. 


Divide Raw 


ee ed) 


The resultant three U, W, and H adjusted 
composite changes were combined in a 
I — 7 — I ratio, as were the U, W, and H 
raw composites, to give a UWH adjusted com- 
posite change (UWH, -.) for each pupil. The 
mean adjusted composite change for each class 
and for the combination of all classes for the 


TABLE XIVb 


TABLE FOR CONVERTING WRIGHTSTONE COMPOSITE RAW CHANGE Scores INTO WRIGHTSTONE 
COMPOSITE ADJUSTED CHANGE SCORES 


Prrverh 


BO RO PO FO RD ND BD G0 Go 60 G0 0 20 G0 G0 99 99 


If Initial Raw 
Composite Score 
Falls Between 
.11—12. 31 
. 82—12. §2___._- ; 
.§8—12.77_____-_- 
. T3—13. 00. ___- 
a oe oe ee 
= eae es 
. 52—13. 81 
5 SS SS eee 
cS ae 
_ - >) aa ee 
. T3—15. 07_______- aS 
SS 2 See eee 
15. 49—15. 96 ee 
 — + SSRs 
Sk aaa 
- 2.2 Saas . 
4 2 * zee 
) = Seas i 
Se ne eae 


Divide Raw 
Composite 
Change By 


fmt eet et et et et et et et 


Septen 


6D OD WO 0 00 00 OO IAI AI NI NID HS 
A oh i, Pe m= DO O00 
ONPHoMre CIF M=S Ss 


ie dd pd bd bd 4 4 4 PS 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE XIVc 


TABLE FoR CONVERTING HILL CoMPOsITE Raw CHANGE Scores INTo HitL CoMPOSsITE 
ADJUSTED CHANGE SCORES 


If Initial Raw Divide Raw If Initial Rew Divide Raw 
Composite Score Composite Cornposite Score Composite 
Falls Between Change By Falls Between Change By 
§. 87— 6. 09___- _- ee 3. 55 10. 04—10. ; 

a die aaetde takin 3. 45 10. 29—10. 
. 50—10. 
. 76—11. 
.02—11. 
. 29—11. 
.56—11. 
. 88—12. 
.19—12. 


et ek et et et et et 


. 13—17. 36 
1 aaptabepmanip sonar 


Pp PO NOLO NO NO ND ND ND BVI 69 69 69 6 


9. 30-10. 03 _ 


TABLE XV 
MEANS For CLASSES ON U, W, H, AND UWH ComPposiITEs OF ADJUSTED CHANGE SCORES 


Unit Wrightstone Hill UWHj 
Class Adjusted Adjusted Adjusted Adjusted 


Change Change Change Change 

ee ‘ , ‘, AT 

II 8 ‘. : 3. . 69 

2. .27 3. . 65 
8 : : 
1. : 
3. 





rs 
5. 
6 
7 
8. 
A 
rese 
whi 
the 
tu: 
fact 
whe 
abil 
I 
fina 
stor 
Sec 
pup 
eco! 
and 
fact 
Pop 
vea 
fac 
ano 


si 


Septe 
,W 
Table 
Th 
nume 


‘Z9l 
“LOT 
“6ST 
“SSI 
‘pLt 
“991 
“6ST 
“99T 
‘OST 
“6ST 
‘ILI 
“Shr 
“T9T 
“191 
“SPI 
“S89 
‘99T 
“SOT 
“SST 
“S9T 
“PLt 
“SLT 
“SOT 
“SOT 
16°8 5 “S91 
98 °L ‘ as “O9T 
&¢ 6 £6 “6ST : “LOT 
: 98 “OT GL‘99T 4 “S9T 
bP IT q " 86 6 bP O9T , “T9T 
un ‘a's UW uw 
8n3e4g 
dTWIOUODY-O1D0g Sulpvay yuaig (syquow ul) “y"9 (syquow ut) *y" jj 
sug Ja[xBl J, 


dNOUD dad TVLOL, ANV SASSVIQ YOd SNOLLVIAG( GUVGNVIS GNV SNVGW 
IAX Fav 


[Vol. 14, No.1 


SINS W Or Ordo BSAA 6 


Ht 10 Hi od Wid HO 


AYOrwonMowwoamrraawore 
RSSRSSSRBSASHSSRSSSSSSrseVzses eB 


a 
vi 


= 
: 
=< 
>) 
=) 
a 
— 
BS 
ai 
= 
5 
ej 
<9 
Lo) 
mJ 
< 
& 
=) 
S 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 33 


U, W, H, and UWH composites are listed in 
Table XV. 

Thus, in the course of this section, eight 
numerically different measures of average 
pupil progress have been established for each 
class. They are as follows: 


1. Average U Composite Raw Change 
(Ur.c.) 

. Average W Composite Raw Change 
(Wrc.) 

. Average H Composite Raw Change 
(Azc.) 

. Average UWH Composite Raw Change 
(UWHx<.) : . 

. Average U Composite Adjusted Change 
(Uac.) 

; Average W Composite Adjusted Change 
(Wac.) 

‘ Average H Composite Adjusted Change 
(Hac.) 

" Average UWH Composite Adjusted 
Change (UWH,<.) 


Anyone of these eight measures should rep- 
resent a valid measure of teaching ability 
which might be used to rank the abilities of 
the several teachers studied, if the teaching 
situation (pupil capacities, environmental 
factors, etc.) in the 28 classrooms was every- 
where alike except for the one factor, teaching 
ability, which is under investigation. 


In addition to the scores on the initial and 
final administration of the eight Unit, Wright- 
stone, and Hill tests, it will be recalled from 
Section II that measures were obtained of 
pupil C.A., M.A., 1.Q., Reading, and socio- 
economic status. Table XVI lists the mean 
and standard deviation for each of these 
factors for each class and for the total pupil 
population. An inspection of these tables re- 
veals that the means of these several pupil 
factors vary considerably from one class to 
another. 


Since the 28 pupil groups should be homo- 
geneous with respect to the factors M.A., 
1.Q., reading, socio-economic status, U com- 
posite initial test score, W composite initial 
test score, H composite initial test score, and 
UWH composite initial test score, if one is to 
base an evaluation of the several teachers’ 
abilities upon the progress of these groups, 
tests of homogeneity were made. 

In one test, the variance between the means 
of the 28 classes was contrasted with the vari- 
ance within the 28 classes. The ratio of these 


two variances is the statistic F’* which enables 
one to determine whether or not the variation 
between class means is significantly greater 
than the variance within classes. If the value 
is larger than the 5% or 1% values, then the 
hypothesis that the 28 classes are homogene- 
ous is untenable. 

The other test was a chi-square test” to 
determine whether or not the variances of the 
pupil factors in the 28 classes were homo- 
geneous. It is conceivable that the 28 classes 
might have discrepant means but have homo- 
geneous variances. 

Table XVII lists the values of F and chi- 
square resulting from these tests as well as 
the corresponding 5% and 1% values which 
are necessary to determine whether the cal- 
culated values refute the hypothesis of homo- 
geneity or not. 


TABLE XVII 


F AND CHI-SQUARE VALUES USED TO DETER- 
MINE HOMOGENEITY OF PUPIL FACTORS 


(28 CLASSES) 
Pupil Factor Chi-square 


47.03 


U Composite—initial 
W Composite—initial 
H Composite—initial 
UWH 

5% Value 

1% Value_ 





The values of F are uniformly much higher 
than the 1% value thus indicating that the 
means of the 28 classes, relative to these 
factors, are too discrepant to be adjudged 
homogeneous. The values of chi-square, how- 
ever, indicate that the variances of five of the 
eight factors are consistent with the hypothe- 


sis of homogeneous variances; the three 
factors whose chi-square values exceed the 
5% values are M.A., socio-economic status, 
and W composite-initial scores. 

In the original conception of this experi- 
ment, this variation of pupil factors was 
anticipated and it was planned to arrive at 
homogeneous sets of pupils by eliminating the 
least number of pupils necessary to arrive at 

™ The test of the significance group means is de- 
scribed by Snedecor in Section 10.4. G. W. Snedecor, Statistical 
Methods; (Ames, Iowa: Collegiate Press, 1938), pp. 182 ff. 


7 A test for the homogeneity of several estimated variances 
is given by M. S. Bartlett, “P of Sufficiency and Sta- 
tistical Tests,” Proceedings of the Royal Society of London, 
Series A, Vol. 160 (1937), pp. 273 ff. 





34 JOURNAL OF EXPERIMENTAL EDUCATION 


28 classes might be all equated with each 
other on a pupil-to-pupil basis or at least on 
the basis of comparable means and standard 
deviations. After a lengthy examination of 
the pupil data, however, it appeared that the 
means of the various classes differed so widely 
that neither of these processes could be carried 
out without eliminating most of the 375 pupils 
involved. 

It seemed best, therefore, to control the 
pupil factors statistically. The eight measures 
of average pupil progress listed on page -- 
were each considered to be a function of three 
things: (1) the ability of the teacher; (2) the 
capacity, social background, maturity, and 
initial achievement levels of the pupils; and 
(3) other factors not measured. It was 
assumed that the 28 classes were comparable 
as far as the “other factors not measured” 
were concerned. If this is true then the meas- 
ures of average pupil progress are functions 
of teacher ability and pupil factors plus a 
constant factor. 

The criterion of teaching ability being 
sought is that portion of the average pupil 
progress which is ascribable to the teachers’ 
influence. For the purpose of measuring the 
teaching ability of one teacher relative to an- 
other, it makes no difference if all of the 
teacher measures are too large or small by a 
constant, the constant being the effect of the 
“other factors not measured”. The teacher 
effect plus this constant can be obtained from 
the average pupil change by subtracting from 
the latter that portion of the pupil change 
that can be mathematically ascribed by mul- 
tiple regression to the pupil factors. 

In order to control validly the variability 
of the pupil factors, (class means of M.A., 
1.Q., reading, socio-economic status, and ini- 
tial test measures) it is, however, necessary 
that the variances of the classes used in cal- 
culating the multiple regression equations be 
homogeneous. It will be recalled from Table 
XVII that the 28 classes can be considered 
as having homogeneous variances with respect 
to all the pupil factors except, M.A., socio- 
economic status, and W composite-initial. To 
arrive at a set of classes for which the vari- 
ances were homogeneous for all of these 
factors, it was found necessary to delete four 
classes. These classes were numbered, 4, 11, 
14, and 20."* All subsequent calculations will 

%™ By trial and error in eliminating those classes having ex- 
treme variances, it was found that this was the least number 
of classes which needed to be deleted in order that the re- 


maining classes might represent a group whose variances were 
homogeneous on the several pupil factors. 


[Vol. 14, No.1 


be based, then, upon the remaining 24 classes 
which have a total of 342 pupils. 

Table XVIII lists the values of F and chi- 
square for testing the homogeneity of the 
means and standard deviations of the remain- 
ing 24 classes together with the 5% and 1% 
values which are necessarily based upon some- 
what different degrees of freedom than for the 
case of 28 classes. Thus by eliminating 4 
classes, the values of chi-square for each of 
these factors, except for socio-economic status 
which was not included in the original test 
but added later, is less than the 5% value and 
hence consistent, except as noted, with the 
hypothesis of homogeneous variance. 

In order to obtain a multiple regression 
equation of these factors on average pupil 
change, for the purpose of estimating that 
portion of pupil change attributable to these 
pupil factors, it was first necessary to calcu- 


TABLE XVIII 


F AND CHI-SQUARE VALUES USED TO DETER- 
MINE HOMOGENEITY OF PUPIL FACTORS 


(24 CLASSES) 


F Chi-square 


Pupil Factors 


Socio-Economic Status ___-___ 3 
U Composite—initial__._____- 
W composite—initial__.._____- 
H composite—initial 

UWH composite—initial _ - 


i ah Sie : 
, 4. eee 





mm | mm CocoN Nooo 


late the intercorrelations with average change. 
Table XIX lists all of the correlations which 
were necessary for each factor together with 
the means and standard deviations of the 24 
class means. These correlation coefficients 
were each calculated by correlating the appro- 
priate set of 24 pairs of class means.”® 

The next step in correcting the average 
composite changes as measured by one of the 
eight composites for variability in the meas- 
ured pupil factors is to obtain the prediction 
equation which enables one to calculate the 
average change one would expect a class with 
known capacities to make. To do this, eight 
different equations were obtained, one for 
each of the eight measures of pupil progress. 
_% “The correlation between means is equal to the correla- 


eq 
tion between single measurements.” J. P. Guilford, Psycho- 
metric Methods, (New York: McGraw-Hill, 1936), p. 373. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 35 


To illustrate the procedure, one may con- 
sider the case of average change as measured 
by the W composite raw change. The pupil 
factors which are to be statistically controlled, 
in this case, are M.A., L.Q., reading, socio- 
economic status and W composite initial 
status."* The correlations which are involved 
in obtaining this particular prediction equa- 
tion are listed in Table XX. 

The beta coefficients which are necessary to 
express the multiple regression equation of 
W composite raw change on these five pupil 
factors were obtained by the Doolittle 
method.” The values of the beta coefficients 
turn out to be: 


B 01.2345 = += 335 
B 02.1345 == —.087 
B 03.1245 == —.318 
B 04.1235 == —.072 
B 05.1234 = —.374 


Composite—Adjusted Change 


UDED) 


. 85 


The multiple correlation coefficient between 
these five pupil factors and W composite raw 
change was .52. 


Using the standard deviations listed in 
Table XIX to convert these beta coefficients 
into 5 coefficients and calculating the values 
of the constant, the prediction equation be- 
tween these variables can be written: 


UWH 
9. 60 


Te paeee 


24) 
“45.28 14.68 11.71 129.74 


t Measures Composite—Raw Change 
1,24 


H 


1.11 


(N 


I. Q. Reading Sims Composite—Pre-tes 


ror ALL Factors INCL 


le 
* 
4 
2 
= 


READING, Socro-EcoNoMIc STATUS) AND COMPOSITE MEASURES OF CHANGE 


1.18 


0 = —.346X, — .035X, 
+ .052X, — .031X, — .o18X, + 2.87 


.79 


where X, — predicted average W composite 
raw change 
X, = W composite initial score—Class 
Mean 
X, = Reading score—Class Mean 
X, = M.A. score—Class Mean 


X, = Socio-Economic score — Class 
Mean 


X, = LQ. score—Class Mean 


% The measure of initial status which was controlled in the 
case of each measure of average change was the initial status 
as measured by the same composite as the particular change, 
i.e., when adjusting Up, or U, , the measure of initial 
status which was controlled was initial status as measured by 
the U composite; when dealing with W changes, the W com- 
posite of initial status was controlled; when dealing with H 
change, the H composite of initial status was con etc. 


(Mn’s AND S.D.’s 
2. 62 


17 £0.53 14.19 
9.98 


ite Pre-Test 


7 The operations involved in the Doolittle method are fully 
explained in any of the following references: C. C. P 

. C. Wykes, “Simplified Methods of uting Regres- 
sion i d i Multiple tions.” Jour- 
nal of Educational Research, XXIII (1931), pp. 383-393; 
“Correlation and Machine 


H. A. Wallace, and G. 
Calculation,” and J. P. 


CorRELATIONS BETWEEN Pupit Factors (M.A., I. 
M.A 


¢~ wa: 1931) 35-45 
owa: >» Pp. > 
Psychometric Methods, (New York: McGraw-Hill, 
1936), pp. 393-397. 


| oT ae 
PN.  cush te stutes 
err 
U—Compos: 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 Septem: 


TABLE XX Having 


CORRELATIONS EMPLOYED IN THE PREDICTION OF W COMPOSITE RAW CHANGES pg 
From Pupit FAcTors » 
tion ec 
of pup 
lists th 
tion Co 
and T: 

dictior 

By 

tions, 

calcul: 

BETA COEFFICIENTS AND MULTIPLE CORRELATION COEFFICIENTS FOR FIVE PUPIL pupil 
FACTORS AND EIGHT CRITERIA a 

n- 


Beta Coefficient for Factor raw 2 


Dependent 1 2 3 4 5 
Variable Composite Socio- 
Initial Reading M.A. Economic I. Q. 
Test Status 
U composite R. C.__- é —.024 —. 367 —.168 . 849 
W composite R. C. -- ‘ —.318 , - 885 —. 072 . 087 
H composite R. C._-- . —. 326 . 198 —. 086 . 123 
ai a composite 


Rei2345 


—. 353 .279 —.107 . 028 

UcompositeA.C....  —. 078 —. 020 —.361 —.165 . 878 

W composite A. es —.179 —. 208 . 295 .014 .014 

HcompositeA.C.... —.486 —.374 . 421 . 071 . 120 
_— 

—. 338 . 314 .012 . 035 


*R. C. =Raw Score change; A. C. =Adjusted score change. 


TABLE XXII 


MULTIPLE REGRESSION EQUATIONS FOR PREDICTING AVERAGE PUPIL RAW AND ADJUSTED 
CHANGES ATTRIBUTABLE TO PUPIL FACTORS 


Xe,y =—.173X1 — .002X2— .044X3 — .057X4 + .055X5 + 7.02 
Xe2 =—.346X1 — .0385X2 + .052X3 — .031X4— .018X5 + 2.87 
Xe3 =—.743X1— .052X2 + .045X3 — .054X4 + .036X%5 + 4.10 
=—.821X 1 — .315X2 + .354X3 — .378X4— .047X5 + 30.99 
=—.045X 1 — .001X2 — .024X3 — .030X4 + .033X5 + 2.80 
=—.117X1 — .016X2 + .082X3 + .004X4— .002X5— 1.21 
=—.871X1 — .074X2 + .119X3 + .056X4— .045X5 + 1.33 
=—.085X 1 — .224X2 + .296X3 — .032X4— .043X5— 6.51 


In which 


Mean Pre Composite 

Mean Reading Score 

Mean M.A 

Mean Socio-Economic Status Score 
Mean I. Q. 

Predicted Unit Raw Change 
Predicted Wrightstone Raw Change 
Predicted Hill Raw Change 
Predicted UWH Raw Change 
Predicted Unit Adjusted Change 
Predicted Wrightstone ao Change 
Predicted Hill Adjusted Change 
Predicted UWH Adjusted Change 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 37 


Having found this equation, the predicted W 
composite raw change average for each of the 
24 classes can be predicted. 

By an exactly similar process, the predic- 
tion equations for the other seven measures 
of pupil progress were obtained. Table XXI 
lists the beta coefficients and multiple correla- 
tion coefficients corresponding to each solution 
and Table XXII lists all of the resultant pre- 
diction equations. 

By means of these eight prediction equa- 
tions, a predicted average pupil change was 
calculated for each of the eight measures of 
pupil change for each class. Table XXIII lists 
all of these predicted pupil changes. 

In Table XXIV are listed the means of the 
raw and adjusted pupil changes for the U 


composite, W composite, H composite and 
UW4 composite. 

If we let g, equal the observed pupil 
changes, which were listed in Table XXIV 
and g, the pupil changes which are described 
above in Table XXIII, we obtain that por- 
tion of g, which measures the relative teaching 
abilities of the several teachers, g,., by sub- 
tracting g, from g,, that is, 


&t = 8 — 8p 


Table XXYV lists these differences for each 
teacher for each of the eight measures of pupil 
change. These eight measures will hereafter 
be referred to as the eight criteria of teaching 
ability. 


TABLE XXIII 


PREDICTED AVERAGE PUPIL CHANGE ON INDICATED COMPOSITES ATTRIBUTABLE TO PUPIL FACTORS 
or M.A., L.Q., READING, AND Socio-Economic STATUS 
Predicted Predicted Predicted Predicted Predicted Predicted Predicted Predicted 
UWH Wright- 
stone 
A.C. 


Hill 


R.C. 
13.79 
10.90 . 92 . 82 
10.39 . 98 . 90 
10. 21 06 - 76 
24.02 . 

13. 67 01 


i i DO 


1. 
; 2. 
1.02 1. 


genity. 


Unit 
A.C. 
1.18 . 89 


Hill 
A.C. 


> 


SHAAN 
SESREBRARO 


— 


OP ARIABAAWNN SOAP we 
- © e =] 00 — © p= 
REESrSRASReSES@ 


nr 
— 


27 


81 : , 8.19 


*These class numbers are those originally assigned; four classes were omitted to establish homo- 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 


TABLE XXIV 
MEAN OF OBSERVED PUPIL CHANGE SCORES FOR INDICATED COMPOSITES 
(Raw Score) 


U Ww H UWH U Ww H UWH 
Class Composite Composite Composite Composite Composite Composite Composite Composite 
R.C. R.C. R.C. R.C. A.C. A.C. A. C. A.C. 


S| 1.24 10.12 1. 
. 88 3.26 11. 64 ‘ ° 8.81 
24. 57 , ‘ 3. 
14. 04 : ‘ 
44.82 3. 
—. 55 
5. 67 
11. 66 
- 40 
12.80 
10. 47 


4.33 


4.10 


TABLE XXV 
CRITERIA OF TEACHING ABILITY: PORTION OF AVERAGE PUPIL CHANGE ATTRIBUTABLE TO TEACHER 
EFFECT AND CONSTANT FACTORS 
(Raw Score) 
Cl C id C ag C - C C “ Cc he Co: ~ oan 
ass om te Com te Composite Composite Composite Composite Composite Composite 
er ae ORR OO OO One ie 
R.C. R.C. R. C. R.C. A.C. A.C. A.C. A. C. 
—.31 . ; —. 56 


—.25 , - 74 . 39 
1.21 , . 1.07 


2.89 
—1. 85 
—1.47 

—.06 
—.13 

. 37 
—. 34 
—. 62 


bhaob amecbdivred. 5 


ge Sesexessaeeaesst 


60 , : —.04 


—.27 
-01 
—. 48 
-14 
—.79 
—. 03 
—.16 
. 82 
—.71 


hheadmapasres, 
 SeSeerSVss 


= 
om) 
7) 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


C,=g (derived from U composite 
R.C.) 


C,=£: (derived from W composite 
R.C.) 


C,;=&: (derived from H composite 
R.C.) 


C,=g: (derived from UWH composite 
R.C.) 


C,=&: (derived from U composite 
AL.) 


C,=& (derived from W. composite 
AL.) 


C,=8: (derived from H composite 
AL.) 


C,= 8: (derived from UWH composite 
AL.) 


Each of these criteria is a measure of that 
portion of the average pupil gain which is 
attributable to the effect of the teacher plus 
a quantity assumed to be constant for all the 
teachers under investigation. The 24 indices 
listed in any one of the eight columns repre- 
sent a set of comparable measures of teaching 
ability based upon pupil groups which were 
rendered comparable by statistically control- 
ling the effect upon pupil change of the vari- 
able pupil factors: M.A., 1.Q., reading, socio- 
economic status, and initial status. 


It is interesting to note that, for the 24 
teachers who make up the sampling from 
which Table XXV was obtained, six teachers 
had negative scores in all the criteria of teach- 
ing ability whereas four teachers had positive 
scores and were above average in all of the 
criteria. For the remaining teachers, no such 
clear trends are indicated. Most of the teach- 
ers appear to vary in their teaching abilities 
and show tendencies to be above average in 
some of the criteria and below average in 
other criteria. 


The distinctive value of the criteria of 
teaching ability presented here lies in the 
fact that ability is ascertained not by sub- 
jective supervisory ratings which permit of 
halo effects, shifting standards of evaluation, 
and influences extraneous to the teaching 
process itself, but by objective instruments 
of measurement impartially applied to the 
pupils in actual class-room situations. 


SECTION IV 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES 


The purpose of the previous section was 
to establish criteria of teaching ability arrived 
at objectively and upon the assumption that 
changes produced in pupils are the most de- 
sirable criteria of teaching ability. The next 
step is to determine the statistical validity 
of the various teacher measures by intercor- 
relating all of the teacher measures with each 
of the eight criteria of teaching ability. 

It will be recalled from Section III that 
scores on the following measures were ob- 
tained for each of the 28 teachers. The 


measures used and the notation to be applied 
to them is as follows:*"* 


T, Wrightstone—Abilities to Organize Re- 
search Materials 
T, American Council Psychological Exam- 
ination 
Social Attitudes of Secondary School 
Teachers 
Yeager—Scale of Attitude Towards 
Teachers 
Torgerson—Mental Hygiene 
Teachers College Psychological Examina- 
tion 
Test on Community Planning (Unit IT) 
« Health Test (Unit I) 
» American Council Civics and Government 
Test 
T,. Torgerson Diagnostic Teacher Rating 
Scale (Investigator) 
T,, Bernreuter—Personality Inventory—Bn 
T,, Orientation Test 
T,, Bernreuter—Personality Inventory—Fc 
T,, Almy-Sorenson Rating Scale for Teach- 
ers (Investigator ) 
T,, Bernreuter—Personality Inventory—Bd 
T,, Michigan Rating Scale (Investigator) 
T,, Morris Trait Index L 
T,, Bernreuter—Personality Inventory—Bs 
T,, Michigan Rating Scale (Supervisor) 
T,, Bernreuter—Personality Inventory—Fs 
T,, Washburne — Social Adjustment Inven- 
tory 
T.. Torgerson—Teacher Problems 
T,, Stanford Educational Aptitudes Test T-A 
T,, Stanford Educational Aptitudes Test A-R 
™e For a fuller description of these see pp. 15-20. 





40 JOURNAL OF EXPERIMENTAL EDUCATION 


T,, Almy-Sorenson Rating Scale for Teach- 
ers (Supervisor) 

T., Stanford Educational Aptitudes Test T-R 

T., Torgerson Rating Scale (Supervisor) 


In order that the arithmetical work in cal- 
culating this large number of correlation co- 
efficients might be facilitated, each raw score 
for each teacher as well as each criterion score 
for the group of 24 teachers was converted 
into a standard score by subtracting from it 
the proper mean and then dividing the differ- 
ence by the proper standard deviation of the 
distribution of scores for the 24 teachers, i.e.. 

= ra! tc. Th 

SD. vo ee een 
standard scores thus obtained for the criteria 
of teaching ability are presented in Table 
XXVI. 

The coefficient of correlation between any 
two variables could then be obtained by use 
of the formula 


> 2 


[Vol. 14, No.1 


Table XXVII lists all of the resultant cor- 
relations of, (1) the criteria with each other, 
(2) the teacher measures with the criteria, 
and (3) the teacher measures with each other, 
Interpretation of Table XXVII will be made 
in this order. 


(1) Interpretation of intercorrelations oj 
criteria with each other.—Consideration of 
the intercorrelations of the criteria with each 
other raises the pertinent problems of deter- 
mining whether any significant relationships 
exist between raw and adjusted pupil changes 
for each of the four types of composites used, 
U, W, H, and UWH, and whether any sig- 
nificant relationships exist between the raw 
and adjusted pupil changes for the several 
combinations of composites. 


The correlations between raw and adjusted 
changes for the four types of composites may 
be presented as follows: 


Composite TRC.) (AL) 


TABLE XXVI 


CRITERIA OF TEACHING ABILITY—PORTION OF AVERAGE PUPIL CHANGE ATTRIBUTABLE 
TO TEACHER EFFECT AND CONSTANT FACTORS 


(Standard Scores) 


H 


Cc eal Cc “4 Cc ~ Cc C rd Cc bed Cc te C baa. 
omposite Composite Composite Composite Composite Composite Composi omposite 
sc. —<— 2 a a. 2 

R.C. R.C. R.C. R.C. A.C. A.C. A 


XXVIII--- 


7) s) 
C. A.C. 


1. 
1 


Septeml 


eae wer 3 


ee ee 


- 
oenes 


om rt et te Od ee I 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE XXVII 
INTERCORRELATIONS AMONG THE TEACHER MEASURES AND THE EIGHT CRITERIA (N = 24) 


Ti T: 
, Wrightstone_.. 1. 4 : . 39 
aun. Coun. 
Psych . 
Social Attitudes......_____- 
Yeager 
Mental Hygiene 
Teachers Psychological 
Commnnity Planning 
Health 
Amer. Gov. & Civics 
Torgerson Teacher 
Rating-I - _- 
Bernreuter-Bn 


AAAs YssasAsA 4A 
w 


~ 
cee neoane & 


a3 


Michigan-I 


Ts Tr Ty 


Ti Tu 
. va . 44 .01 


.64 —.22 
.46 —.30 
02 .07 
.30 —.13 
.44 —.03 
04 —. 10 
—.05 —.18 


Tis 


ee EDT AE LIAR NA RPS NAT DE ARAM TRS IN ET oo 


Stanford T-A 
Stanford A-R___.___-- 

5 Almy-Sorenson-S ----------- 
Stanford T-R ______-_-- 
Torgerson Rating-S _- 
hs wan wil 


C 
C 
C 
C 
C 
C 
Cc 


From the above it is seen that the corre- 
lations between raw change and adjusted 
change scores for all of the composites, ex- 
cept for the H composite, are high. One can 
safely conclude from this that the use of raw 
change scores is as desirable as the use of 
adjusted change scores, providing that the 
variability of the other pupil factors is con- 
trolled. The fact that making adjustments to 
raw scores is very time consuming should 
speak favorably for the use of raw change 
scores for which pupil factors have been 
statistically controlled. 

In order to ascertain whether adjusted 
changes are differently interrelated than raw 
changes are interrelated, the significance of 
the difference between the obtained corre- 
lations should be calculated. Fisher’s z test,"* 
used to measure the significance of differences 
between correlations was applied to all pairs 

™R. A. Fisher, op. cit., Section 35. 


i 8 oo nis ais aie Ce a PAT allt Se 


of correlations with the result that no differ- 
ence between correlations was found to be sig- 
nificant.”* This again is evidence of the fact 
that one may use either raw or adjusted 
change scores, but because of the relative ease 
with which the former can be obtained and 
used, it should be the most desirable of the 
two methods for employing pupil change 
scores. 


(2) Interpretation of intercorrelations be- 
tween teacher measures and the criteria—To 
facilitate interpreting the intercorrelations be- 
tween the various teacher measures and the 
criteria, it would be expedient to consider the 
teacher measures as falling under the follow- 
ing catagories: 

™ The largest difference of the values of s, for the correla- 
tions of .980 and .959, .3666 did not exceed twice the stand- 
ard error of difference (.3086). The difference is then not 
statistically significant. The discussion of the adjusted scores 
has been retained for those that may be interested. See orig- 


inal thesis on file University of Wisconsin Library for these 
correlations. 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.z 


© 
3 
3 


TaBLe XX VII—Continued 


Tis Tis Tis Tie Tiz Tis Tio Teo 
Wrightatone... .07 .84 .06 .25 .@ .14 .18 «21 
Amer. Coun. 


Tae Tos 
—.08 —.22 


oe 


s 


: —.20. . . -67 .47 .04 .47 —. F a 
Attitudes —.21 . ; = =| 6 , : .05 .08 
Yeager . 02 ‘ .08 —.06 .15 —. : ‘ .538 .—22 
Mental Hyg.... —.21 ae ee! . Gees , : -26 .04 
Teachers 
Psychological... —.04> . ’ .88 .24—.27.. : P .19 .02 
Com. Planning. —.06 . : ae = ae : ; . 
Health ; ; ; .01 —.06.—.06 ; : a: ee 


48 .16 
-46 .27 


‘ . 7 ‘ .08 —. 50 
Orientation_... —. ‘ e , . 23 


Bernreuter, Fc_ 
Almy-Soren- 


B44 sass AH 
oun @ 
AxAAAAAyAAHA 


eeneoncrenenr 


Michigan, I 

Morris Trait Index 

Bernreuter, 

ae none nneenedere---e--e- 
Bernreuter, Fs 
ae ae 
Teachers Problems_______________ 


set tm ee Owe RP 


Q2aaaag 
@exneoarww 


T 
Cc 
Cc 
Cc 
Cc 
Cc 
Cc 
Cc 
Cc 


(a) Supervisory Rating Scales (e) Knowledge of Subject Matter 
1. Ratings by supervisors 1. Health Test (Unit I) 
2. Ratings by investigator 2. Test on Community Planning 
: (Unit IT) 
aa soo Council Psychological 3- Wrightstone — Abilities to Or- 
Examination ganize Research Abilities 
2. Teachers College Psychological 4. American Council Civics and 
Examination Government Test 
7 (f) Social Attitudes 
(c) Personality 


. 1. Social Attitudes of Secondary 
1. Bernreuter Personality Inventory School Teachers 
2. Morris Trait Index L 


2 , (g) Mental Hygiene 
3. Washburne Social Adjustment 1. Torgerson—Test of Mental Hy- 
Inventory i 


giene 
(d) Attitude Towards Teaching (h) Ability to understand disciplinary 


1. Yeager — Attitude Towards 
Teachers and the Teaching Pro- 
fession 


problems 


1. Torgerson — Test of Teaching 
Problems 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY , 


TaBLe XX VII—Continued 


Tes Tas 
.02 
—.10 
—. 32 
—.02 
. 36 
—.20 
. 86 


Tez 
Wrightstone 
Amer. Coun. Psych. -_- 


te Da De Lae Dar ee hee bee hee 


eenrencneewewr 


Amer. Government 


Torgerson Rating-I --- 
Bernreuter-Bn....... —. 


is 
— 


Teachers Problems- - - - 


Almy-Sorenson,§ -.--- 
Stanford T-R 
Torgerson Rating-S 


Ce 
Cs 
C4 
Cs 
C «6 
Ci 
Cs 


(i) Teaching Aptitude 
1. Stanford Educational] Aptitudes 
Test 


(j) General information and freedom from 
superstitions and prejudices 
1. Lewerenz—Steinmetz—Orientation 
Test 


The interpretation of the intercorrelations 
between teacher measures and the criteria 
given in Table XXVIII as a subset from 
Table XXVII can be further facilitated when 
it is determined how large a correlation must 
be in order that it be significantly different 
from zero. For 24 pairs of variates, a cor- 
relation of .40 is statistically significant in 
that there are only 5 chances in 100 that 
such a correlation will arise from an uncor- 
related population and a correlation of .52, 
based upon an equal number of pairs of 
variates, is highly significant and will arise 


C: Cs Ca % 


in only 1 chance out of roo from an uncor- 
related population.” 

(a) Supervisory Rating Scales and Criteria 
of Teaching Ability.—As has been previously 
pointed out, the traditional method of ascer- 
taining teaching ability has been to have 
supervisors rate teachers on rating scales of 
one type or another. 

The correlations obtained from the data 
of this experiment, between the various rating 
scales and the criteria established on pupil 
changes, fail to reveal any significant rela- 
tionships between the two. Data are not here 
available to determine which is to be pre- 
ferred. It is quite possible that each measures 
a different aspect of teaching ability. 

(b) Intelligence and Criteria of Teaching 
Ability —The coefficients of correlation pre- 
viously reported between intelligence and 
— ability have not been large. The fol- 


oj. Guilford, Psychometric Methods (New we Mc- 
Grae Ht, 1936), pp. "Sa8—549 hg R); or H. A. Wallace 


and . 
1931 (Ames, Iowa), Pp. 62-23 (Table 16). 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE XXVIII 
INTERCORRELATIONS OF TEACHER MEASURES WITH CRITERIA 


N= 


Teacher C, 
Measures 
Wrightstone 
Amer. Coun. Psych. -- 
Social Attitudes 


Mental Hygi . 50 
Teachers Psychological . 32 


Community Planning. .29 
Health 17 


29 
33 


Sec hae har Lar Lae Ler Ler Lee Lee Ler La 


Torgerson Rating-T _- —.14 
lowing correlations between intelligence and 
teaching ability have been reported: Knight, 
.0o; Somers, .43; Whitney, .03; Tiegs, .o1; 
Boardman, .33; Ullman, .15; Odenweller 
(highest score) —.04, (median score) .oo. 
The criteria of teaching ability for the above 
mentioned studies were based, however, upon 
the supervisory ratings of teachers. 


Barr, Torgerson, et al, obtained a corre- 
lation of .37 between teacher intelligence and 
a composite of pupils gains, but it should 
be recalled that this composite included pupil 
accomplishment quotients which tend to be 
unreliable. Thus, the correlation of .37 may 
be lower than it would have been had a more 
reliable index of pupil gain been used. 

The correlations between intelligence, meas- 
ured by the American Council Psychological 
Examination, and criteria C, and C, are .58 
and .57 respectively.** These correlations are 
Statistically significant and indicate intelli- 

© For purposes of interpretation, C, and C,, (the UWH 


composites) have more meaning than the other single com- 
posites since C, and C, represent a more complete picture of 


teaching ability than do the si composites and incl the 
uae ben dl @ Ge toe aed oe ee Ae 


24 


C4 
UWH.... 


gence to be an important factor in teaching 
ability. 

Intelligence as measured by the Teachers 
College Psychological Examination gives cor- 
relations which fall below the points of sta- 
tistical significance except for C, and C,, 
where the coefficients are .47 and .42, respec- 
tively, and are statistically significant. The 
coefficients of correlation between this meas- 
ure and C, and C,, .37 and .40 respectively, 
are insignificant although the latter closely 
approaches the point of statistical significance. 

Of these two intelligence tests, it appears 
that the American Council Psychological Ex- 
amination is more definitely associated with 
teaching ability as measured in this study 
than is the Teachers College Psychological 
Examination. 

(c) Personality and Criteria of Teaching 
Ability —The role that personality plays in 
teaching ability is one which persistently ap- 
pears in educational literature. Among the 
correlations reported between measures of 
personality (by rating scale method) and 
teaching success (also measured by rating 
scales) the following may be noted: Ruediger 


Septen 


and § 
25 - 
a coef 
ing x 
weller 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 45 


and Strayer, ..7; Somers, Odenweller, 

25 —.83. Barr, Torgerson, = ee obtained 
a coefficient of .o4 between a personality rat- 
ing scale and total pupil raw score. Oden- 
co Fa concluded that, “The outstanding trait, 
the one most closely associated with effective- 
ness in teaching, is personality.”** 

Previous investigations made use of rat- 
ing scales which permitted much subjectivity 
to enter and placed the measurement of per- 
sonality upon bases which could easily be 
shifted even by the same raters rating the 
same teachers at different times. In many 
instances, those who rated teachers on per- 
sonality traits also rated them on teaching 
ability so that a halo effect was obtained. 
Such halo effects, as pointed out by Knight,** 
tend to raise the coefficients of correlation. 
The correlation of personality ratings with 
supervisory ratings as measures of teaching 
ability is of doubtful validity. 

The results of the present study do not 
attach to personality as here measured the 
importance attached to it by other investi- 
gators. The correlations between personality, 
as measured by the Bernreuter Personality 
Inventory (Bn, Bs, Bd, Fs, and Fc) and the 
criteria of teaching ability do not reveal any 
statistically significant correlations. When 
scores on the Morris Trait Index L presumed 
to measure leadership are correlated with 
the criteria of teaching ability, no statistic- 
ally significant correlations are revealed. 
Scores on the Washburne Social Adjustment 
Inventory when correlated with the criteria 
of teaching ability likewise yield no statis- 
tically significant correlations. It then appears 
that personality as here measured is not im- 
portant in conditioning teaching ability when 
pupil change is employed as the criterion of 
teaching efficiency. 

(d) Attitude Towards Teaching and Cri- 
teria of Teaching Ability—Previous investi- 
gations have shown little correlation between 
interest in teaching and teaching ability. 
Barr, Torgerson, et al, found a correlation 
of .o4 between scores on the Strong Voca- 
tional Interest Blank and pupil gains in total 
raw scores. Ullman found a correlation of .o2 
between scores on the Cowdery-Strong Inter- 

"A. L. Odenweller, Predicting the oul of feaiine, 


Contributions to ie, No. 676 (New 
_ Teachers College, Columbia Paes oe 


Knight, Qwalities Related to Success in Zosciins, 
Cattetian to Education, No. 120 (New York: u of 
my Teachers College lege, Columbia AF 022), 


est Report Blank and teaching success. Both 
of these interest blanks are so devised that 
measures of interest to more than one occupa- 
tion or profession can be obtained from one 
administration of the test. 


For the present study, the Yeager scale, 
built upon the statements of high-school 
seniors interested in teaching, was used. 
Scores from this scale correlated with C, and 
C, to the extent of .45 each. These corre- 
lations are statistically significant and indi- 
cate a negative relationship* between in- 
terest in teaching as measured by the 
Yeager scale and teaching ability. It seems 
thus to suggest the opposite to Knight’s state- 
ment, “it is reasonable to suppose that gen- 
uine teaching interest in one’s work accounts 
for a large part of teaching success.”** 

(e) Knowledge of Subject Matter and Cri- 
teria of Teaching Ability—Most of the pre- 
vious investigations have defined knowledge 
of subject matter as the total scholarship 
record or as the scholarship record in the aca- 
demic portion of work taken by the teacher. 
Correlations obtained between such measures 
of subject knowledge and supervisory ratings 
have been reported as follows: Meriam, .28; 
Whitney, .97; Odenweller, .28. 

In the present study, the teacher’s knowl- 
edge of subject matter was confined to meas- 
urements in the subject area in which teaching 
occurred. Correlations between measures of 
subject knowledge (Unit test, Community 
Planning, Test, and American Council Civics 
and Government Test) and the criteria of 
teaching ability yield, in most instances, no 
significant correlations. These teacher meas- 
ures are primarily tests of information and 
indicate no significant relationship between 
knowledge of subject information and teach- 
ing ability. 

Correlations between the Wrightstone Abil- 
ities test, designed primarily to measure “non- 
informational” objectives of the social studies, 
and the criteria yield significant correlations 
for all criteria except C, and C,,. 

(f) Social Attitudes with Criteria of Teach- 
ing Ability—tThe test of Social Attitudes of 
Secondary School Teachers when correlated 


* Editor's Note: Since the correlation coefficient is positive 
and since low scores on the Yeager scale represent attitudes 
favorable to the teaching profession, a negative relationship 
al the criteria and attitudes favorable to teachers is 
indicat 


“F. B. —— Qualities Related to ag in Teaching. 
lucation, No. 120 (New York: Bureau of 


Contributions : 
Publications,  & College, Columbia University, 2922), 
Dp. 





46 JOURNAL OF EXPERIMENTAL EDUCATION 


with the criteria of teaching ability revealed 
significant correlations with C,, C,, C,, and 
. Cy (.49, .50, .49, and .52 respectively); the 
correlations of this measure with C,, C,, C,, 
and C, were statistically insignificant (.26, 
21, .29, and .29 respectively). It thus ap- 
pears that social attitudes have significant 
relationship to teaching ability when the 
criteria of teaching ability include measures 
of the “non-informational” objectives of edu- 
cation; when teaching ability is expressed in 
terms of information, social attitudes have 
an insignificant relationship. 

(g) Mental Hygiene and Criteria of Teach- 
ing Ability —tThe correlations of .46 and .45 
between scores on the Torgerson test of men- 
tal hygiene and C, and C, respectively, indi- 
cate that a teachers knowledge of mental 
hygiene is sufficiently associated with teach- 
ing ability to be important. The significant 
correlation of .50 with C, and highly signifi- 
cant correlation of .54 with C, are convinc- 
ing evidence of this conclusion. 


(hk) Ability to Understand Disciplinary 
Problems and the Criteria of Teaching Ability 
—The Torgerson—Teacher Problems test is 
essentially concerned with whether a teacher 
can diagnose disciplinary problems and prop- 
erly proceed to correct disciplinary conduct. 
Correlations between this measure and all of 
the criteria are statistically insignificant ex- 
cept for the correlations obtained with C, 
and C, (.44 and .42 respectively). These cor- 
relations are significant and apply to criteria 
which are heavily weighted with the informa- 
tional aspects of teaching. It would then seem 
that teachers stressing information, are better 
equipped to diagnose and correct disciplinary 
cases than are teachers stressing other objec- 
tives of education. 

(4) Teaching Aptitude and Criteria of 
Teaching Ability—tThe relationship of scores 
obtained on the Stanford Educational Apti- 
tudes Test with the criteria of teaching abil- 
ity are in no case statistically significant. 

(j) General Information and Freedom from 
Prejudice with the Criteria of Teaching Abil- 
ity.—The correlations of scores from the Ori- 
entation Test with any of the criteria em- 
ployed are not statistically significant and 
would seem to indicate, strangely enough, that 
teachers’ general information and freedom 
from prejudice as measured by the Orienta- 


[Vol. 14, No.1 


tion Test are of no consequence in teaching — 
ability. 

It is possible to arrange these correlations 
so as to indicate those correlations which are 
significant. These data are given in Table 
XXIX. The significant correlations between 
teacher measures and criteria (correlations 
over .40 and less than .52) are indicated by 
a single asterisk while those correlations which 
are highly significant (.52 or over) are in- 
dicated by two asterisks. The unmarked cells 
indicate that statistically insignificant corre- 
lations were obtained. 

From Table XXIX it is possible to rank 
the teacher factors associated with the criteria 
of teaching ability as follows: 

. Intelligence 

. Social attitudes 

. Knowledge of subject matter 
. Interest in teaching* 

. Mental hygiene 


The above order should not be construed 
as being fixed and immutable. Some difficulty 
was encountered in deciding whether social 
attitudes or knowledge of subject matter 
should be in second place. Yet this order is 
of particular interest in that intelligence is 


indisputably the most important single teacher 
factor associated with teaching ability. 

(3) Interpretation of intercorrelations of 
teacher measures with each other —Because 
of the extremely large number of correlations 
between the various teacher measures with 
each other and because the central problem 
of this study is to examine the relationship 
between various teacher measures and criteria 
of teaching ability objectively determined, the 
interpretation of that portion of Table XXVII 
which is concerned with the correlations of 
teacher measures with each other must nec- 
essarily be limited to those correlations which 
are most significant and relevant to the cen- 
tral problem of this study. 

The relationship between intelligence, 7,, 
and teacher ratings made by the investigator 
is statistically significant whereas such is not 
the case when the teachers are rated by their 
supervisors. 

The Health and Community Planning tests 
when correlated with the other teacher meas- 
ures show no statistically significant correla- 


* Editor's Note: a ae Oy eee tt ee ® 
critical attitude toward teachers and teaching; in 
———e is therefore negatively correlated e. "teaching 

cy. 





September,1945| THE MEASUREMENT OF TEACHING ABILITY 


TaBLeE XXIX 


SIGNIFICANT AND HIGHLY SIGNIFICANT CORRELATIONS BETWEEN TEACHER MEASURES 
AND CRITERIA OF TEACHING ABILITY 


C2 


Cs C4 Cs C; 


ae 


* Correlations statistically significant (r = .404 up to .515) , 
** Correlations highly significant statistically (r=.515 or over). Blank spaces denote correlations 


statistically insignificant. 


tions. These two tests, heavily loaded with 
information, bear no significant relationships 
with any of the other teacher measures which 
is probably due to the fact that these tests 
measured small subject areas. 


It is very interesting to observe that the 
Yeager scale, which was ranked as one of 
the best (r == .45) single, teacher measures 
associated with the criteria of teaching ability, 
yields no significant correlation with any of 
the other teacher measures. This indicates 
that attitude is a very desirable measure to 
incorporate into a battery of tests designed 
to predict teaching ability since this attitude 
correlated highly with the criterion and low 
with the other teacher measures. 


The highly significant correlations between 
the Morris Trait Index L scores and the 
scores on Orientation Test and the Torgerson 
Test of Mental Hygiene (.53 and :52 respec- 
tively) appear to indicate that those possess- 
ing a high degree of leadership as here meas- 
ured, and those who are well acquainted with 
the fundamental objectives of education also 
tend to be able to diagnose the mental 
problems of their pupils. 

All of the correlations except one between 
scores made on the Torgerson Test of Teach- 
ing Problems and ratings on the teacher rating 
scales are statistically significant. We may in- 
fer from this that the ability of a teacher to 
diagnose and treat disciplinary cases is an 
important factor in the ratings of super- 
visors. 

The very highest correlations, ranging from 
62 to .g1 obtained by correlating the rating 
scales with each other are probably spurious 


due to halo effect. Raters seems to arrive at 
the same conclusion or evaluation regardless 
of the scale used. 


SECTION V 
PREDICTION OF TEACHING ABILITY 


From the correlations listed in Table 
XXVII it is possible to compute a number 
of multiple correlations so that the relation- 
ships between various combinations of teacher 
measures with the criteria used can be de- 
termined. By applying the Doolittle method 
to this intercorrelational matrix, the beta 
coefficients, necessary for computing both the 
multiple correlations and multiple regression 
equations, are easily obtained. 


The multiple correlations obtained for com- 
binations of teacher measures progressively 
increasing from two up to and including 14 
measures with each of the eight criteria of 
teaching ability are listed in Table XXX.** 

From Table XXX is is easy to ascertain 
the influence of adding a new measure to the 
preceding pool of teacher measures. The suc- 
cessive addition of measures, in most cases, 
lead to progressive increases in the sizes of 
the multiple correlations. The largest coeffi- 
cient of multiple correlation is .86 and was 
obtained between 14 variables and the cri- 
terion C,. The multiple correlations between 


progressive order, 
would be obtained with each of the eight criteria of teaching 
ability. Lee egg gn oe = -y of numbers in 

lations made i fo"obtain only” the multiple corel 
tions of cobiguions ot two up to and including 14 teacher 
mensures with cach of the eight criteria. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.r 


TABLE XXX 
MULTIPLE CORRELATIONS OF TEACHER MEASURES WITH CRITERIA 


Ci Ce 
31 


oy 


~T; eee, 
ae 


PPP PSP APP PP HP 
a 


ry 


14 variables and the criteria C, and C, are 
.84 and .85 respectively. 

Rather than carry on any further analysis 
with all of the multiple correlations listed in 
Table XXX, it was thought advisable to 
restrict analysis to those multiple correlations 
which would seem to be the most meaningful. 
When the eight criteria are examined to de- 
termine which of these are most meaningful, 
it seems that C, and C,, because they are 
composites of the other criteria and because 
they embody a more complete picture of 
teacher efforts than do any one of the other 
criteria, would be most meaningful. Further 
analysis will be limited, because of this, to 
results obtained by employing these criteria, 
C, and C,. 

Since the coefficients of multiple correla- 
tion were obtained by adding each time a 
new variable to the preceding pool of vari- 
ables, it is possible to obtain a rough estimate 
of the additive value of each variable as it 
is introduced into preceding pools of teacher 
measures. By obtaining the successive differ- 
ences between the sizes of the multiple cor- 
relations for the criteria C, and C,, it appears 
that the six measures contributing most to 
these correlations arranged in descending 
order of magnitude are as follows:* 


* Editor's Note: A listing is meaningful only when the 
Between F, measures and values are taken into to consideration. 
and T, — are two measures and values, namely 

T, and T,, etc. 


Cs C4 Cs Ce 


» @ 


a 


- 


0 
1 


ay 


Cc 
T 
T 
T, 
T 
T 
T 


- 


1 


Combining these two sets by use of the mean 
rank order*® gives the following single list: 


T, (American Council Psychological 
Examination ) 
(Torgerson—Mental Hygiene) 
(American Council Civics and Gov- 
ernment Test) 
(Yeager—Attitudes Toward Teach- 
ers) 
(Community Planning—Unit IT) 
T,, (Bernreuter—Bn) 


The selection of C, and C, for further anal- 
ysis also raised the question as to how sig- 
nificant the multiple correlations were in 
which these criteria had been used. By apply- 
ing tests of significance,*’ the following mul- 
tiple correlations were found to be significant 
at the 5% but not significant at the 1% 
level of confidence. 


J. P. Guilford, op. cit., pp. 246-247. 
™ Ibid., Table K, pp. 548-549. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


Re,.7 3. >’ 
Re,.7..- 
Re,.7;..- 

Re,.2 4. + 83 


. —_ 
29: Bt. 
Br Rée,.7,... 
Re,.T,...T == So 
Re,. ood a3 =e B2 


The same test of significance when further 
applied to those multiple correlations obtained 
with C, and C, showed the oogey A to be 
highly significant ie. significant at the 1% 
level of confidence. 

iT 3 == .63 67 

yA == 64 67 

= .67 -70 

= .73 73 

= .74 74 
me 77 


.76 


77 
80 


| ay per 

| i Oe : 
Ré,.2,...7, 
| POP y 
| a Ae 


Ht a 





49 


in which X, indicates the criterion or de- 
pendent variable and X,, X,, X,, X,, the 
teacher measures or independent variables. 


Since, however, in standard score form, the 
means of the criteria equal zero and the crit- 
eria standard deviations equal one, the gen- 
eral formula can be rewritten: 


a Sa. . ae oe eo ae 
a; 0» on 


Ber yy, —_Beayy Bean, 
oO; C. on 

The proper division of the beta coefficients 
and substitutions gives the following multiple 


regression equations: ** 


— .004X, + .006X, + .004X, + .802X, + .033X,— .002X, + .062X, 


— 11.965 


, 014X, + .012X, + .002X, + .559X, + .033X, — .004X, + .047X, 


.041X, — .006X, — .035X,, 


— .003X,, — 7.890 


. o15X, + .007X, + .oo1rX, + .678X, + .031X, — .006X, — 8.382 
= .016X, + .004X, — .000X, +1.020X, + .040X, + .001X, + .020X, 
+ .048X, —.o15X, + .008X,, + .002X,, — 14.348 





It is interesting to observe that when the 
multiple correlations found for C, and C, are 
interpreted on the basis of significance that 


those correlations with the fewer number of 


variables are the most significant and that 
beyond a point the addition of new variables 
adds little significant value. 

The above lists are still rather long, and 
for the sake of expediency and economy it 
was thought desirable to limit all subsequent 
analyses to the largest multiple correlations 
in either of the lists above. This then resulted 
in the following: 


Re,.7,...%—_ = .77 Re,.7,..:T%, =e. 
Re,.7T,...73,== 83 Re,.7,...7,; == B2 


Since the beta coefficients** had already 
been found when the multiple correlations 
were calculated, it is now very easy to obtain 
four multiple regression coefficients for pre- 
dicting teaching ability as defined by the 
criteria C, and C,. The general form of the 
multiple regression equation can be written as: 


Xe= Ba X, + Bes X,+. 


+ (Me — Bes — << M M, — Beo— 


+++ Ba =X 


oe 


A table giving these beta coefficients will be found in 
the original thesis on file in Library, University of Wisconsin. 


"The unknowns for these four regression equations are 
designated as follows: 
= score on the Wrightstone—Abilities to Organize 
Res. Test 
= score on the American Council Psychological 
Examination 
= score on Social Attitudes of Secondary School 
Teachers 
= score on Yeager—Attitude Towards Teachers 
= score on Torgerson—Mental Hygiene Test 
= score on Teachers College Psychological Exam- 
ination 
= score on Community Planning (Unit II) 
= score on Health Test (Unit I) 
= score on American Council Civics and Govern- 
ment Test 
= score on Torgerson Rating Scale (Investigator) 
= score on Bernreuter Personality Inventory—Bn 
7 = Predicted score on UWH, , (Including Tests 
1-7, incl.) 
41 = Predicted score on UWH, o. 
1-11, incl.) 
« = Predicted score on UWH, « 
1-6, incl.) 
1 = Predicted score on UWH, « 
1-11, incl.) 


RMA KKRK KKK RK OK 


i 


# 


y 


- 
~ 


(Including Tests 


a 


(Including Tests 


P 


_ Cincludiag Tests 


n 
n 


Cc 





50 JOURNAL OF EXPERIMENTAL EDUCATION 


By use of a formula given by Kelly, it is 
found that the standard deviations of es- 
timated standard scores for the above equa- 
tions are equal to .634, .562, .676, and .575 
respectively.” 

The above equations were tested by sub- 
stituting for the unknowns the appropriate 
mean scores obtained from the group of 24 
teachers. The resultant prediction was equal 
to zero thus confirming the fact that the 
predicted score for the average teacher would, 
on the basis of standard scores, be equal to 
zero. 


It is also possible to test the accuracy of 
predicted scores as against obtained scores for 
any one or all of the prediction equations 
given. As an illustration of this, the following 
scores of two teachers, selected at random, 
were substituted for the independent variables 
in each of the multiple regression equations: 

Teacher 1 Teacher 25 


143 


xX 
x, 
xX 
xX 
x; 
xX 
x 
x 
Xx 
x 
x 


159 


The predicted scores obtained as contrasted 
with the observed scores are as follows: 


Dependent 
Variable 


Standard Error 
of Estimate Score 


—. 534 


It is thus seen that in each case illustrated 
the predicted score approximates the observed 
score within a small fraction of the standard 
error of estimate. These prediction equations 
can then be used to predict teaching ability 
with considerable accuracy. 

The use of prediction equations to deter- 
mine teaching ability objectively is very use- 


Mos... a VIR ie 


Teacher 1 
Predicted 


[Vol. 14, No.1 


ful in several ways. For example, teacher 
training institutions can determine the degree 
of teaching ability of their students from the 
application of several measures to the pros- 
pective teachers. Superintendents can make 
use of such prediction equations to ascertain 
the teaching ability of candidates for teach- 
ing vacancies. Yet, before prediction equa- 
tions of the sort here illustrated are to be 
used on a large scale, many more studies, to 
determine degrees of teaching ability ob- 
jectively, are necessary. Such studies will 
have to be carried on in many subject areas, 
on different grade levels, and in all parts 
of the country. 


SECTION VI 
SUMMARY AND CONCLUSIONS 


The purpose of this study is to determine 
the relationship between certain teacher meas- 
ures and measureable pupil changes. 


The results of this study indicate that: 


1. The intelligence of the teacher is the 
highest single factor conditioning teaching 
ability and remains so even when in combina- 
tion with other teacher measures. 

2. The social attittudes of social studies 
teachers is an important factor in teaching 
ability. 

3. The attitude of teachers towards teach- 
ing is significantly correlated with teaching 
ability. It should be recalled that high scores 
indicate a critical attitude toward teachers 
and teaching. 





Teacher 25 
Predicted Obtained 
Score Score 
. 761 . 800 
. 116 . 800 
. 823 .910 
. 510 . 910 


Obtained 
Score 
—.480 
—. 480 
—.010 
—.010 





4- Knowledge of subject matter and the 
ability to diagnose and correct pupil mental 
maladjustment are each significantly associ- 
ated with teaching ability. 


5. The correlations between supervisory 
ratings of teachers and the criteria of teach- 
ing ability used in this study, are statistically 
insignificant. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 51 


6. Personality, as here defined and meas- 
ured, shows no significant relationship to 
teaching ability. 

7. The multiple regression equations give 
predicted scores which closely approximate 
the obtained scores-in those examples in which 
the prediction equations are used. 

The findings of this study place different 
emphases upon those qualities associated with 
teaching ability than those heretofore pre- 


sented. The fact that the criteria here used 
were those of pupil change objectively meas- 
ured appears to bring a different constella- 
tion of qualities to be associated with teach- 
ing ability. 

The findings here presented must be inter- 
preted on the basis of the experimental pat- 
tern and measures, both of pupils and teach- 
ers, used in this study. Subsequent studies are 
necessary to verify the findings here presented. 





THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER TWO 


J. F. Roire 
La Crosse State Teachers College 
La Crosse, Wisconsin 


SECTION I 
THE PROBLEM 


This is the second of a series of studies 
conducted to ascertain the validity of certain 
instruments commonly employed in the meas- 
urement of teaching ability. The results 
secured from the first study’ of this series 
were sufficiently promising to suggest further 
research and the study here reported was 
undertaken to further test these findings. To 
secure comparable results, approximately the 
same sorts of teachers, schools, and measuring 
devices were employed as in the preceding 
investigation. Teaching is an exceedingly 
complex activity, and the controls are not so 
complete as in some fields of research. Thus 
this follow-up study. 


Speciric DESIGN OF THE EXPERIMENT 


The design of the experiment, the measur- 
ing instruments used, and the procedures 
followed were the same as those of the pre- 
ceding study except in certain details to be 
indicated later. 

In the first investigation’ only teachers of 
the 7th, 8th, or combined 7th and 8th grades 
were used. This furnished evidence concerning 
the teaching ability of teachers in small urban 
and first class state graded schools, where the 
teacher taught one or, at the most, two grades. 
The teachers in the study here reported were 
from one and two-room rural schools. Teach- 
ers from this type of school comprise one of 
the largest groups of teachers in the state of 
Wisconsin and hence merited the considera- 
tion here given. 

Following the plan as outlined in the orig- 
inal proposal, the combined 7th and 8th grade 
class in Citizenship of each participating 
school was selected for the study. The follow- 
—s a were taken into consideration in 

Rostker, The Hoomvement and Prediction of Teach- 


Ut q 's Dissertation (Madison, Wis.: 
ine, ~4 pital Doster 1939). , 


determining the grade and subject ara 
selected: 


(1) Citizenship was being taught during 
the 1937—1938 school year at the 7th- and 
8th-grade level in all the rural schools. 

(2) Pupil change in the area of th 
Social Studies was considered of growing 
importance among educational objectives. 

(3) The objectives of Citizenship wer 
broad enough to allow for considerable vazri- 
ability in the techniques of teaching used 
by the teachers selected. 

(4) More desirable measuring instru. 
ments were available in this area and at 
this grade level than in certain other areas 
and grade levels. 


The following limitations were set up in the 
selection of participating schools: 


(1) The schools were to be one- or two- 
room rural schools, employing one and no 
more than two teachers. 

(2) Citizenship was to be taught at the 
7th- and 8th-grade level throughout the 
course of the school year. 

(3) At least five pupils were to have 
been enrolled in the combined 7th and 8th 
grades. 

(4) The teacher must be willing to par- 
ticipate in the investigation. 


Shortly after the opening of the school year 
in the fall of 1937 a large number of schook 
were visited and the proposed plan described 
to the teachers. The teachers in those schools 
meeting the above requirements were invited 
to participate. A sufficient number of teachers 
agreed to participate, and a group of 72 
schools located in Eastern Dane, Westem 
Dane and Columbia Counties within a radius 
of 35 miles of Madison were selected. 

The participating teachers agreed to accept 
the following general objectives for the year’s 
course in Citizenship: 





September, 1945| THE MEASUREMENT OF TEACHING ABILITY 


The teacher should aim in the citizenship 
course to develop the pupil’s ability to dis- 
charge his socio-civic responsibilities with 
intelligence and efficiency. This, as set forth 
in the directions to the teachers, was to 
involve: (1) efficiency in human relation- 
ships: the ability to get along with other 
people; the ability to work cooperatively 
with others in group enterprise; the ability 
to subordinate one’s individual gain to 
group welfare where one conflicts with the 
other, etc.; (2) understanding of the funda- 
mentals of socio-civic relationships: knowl- 
edge of one’s civic rights, duties, and re- 
sponsibilities; knowledge of social, eco- 
nomic, and political principles and practices; 
knowledge of moral, ethical, and religious 
conventions; understanding of the prin- 
ciples and practices of the government 
under which one lives, etc.; (3) the mastery 
of the tools for effective thinking: the 
ability to see and solve problems; the abil- 
ity to collect relevant data, make the neces- 
sary analyses, and reach judgments based 
upon fact; the ability to suspend judgment, 
maintain an open mind, and view solutions 
critically in arriving at decisions, etc.; and 
(4) imterest in socio-civic relationships, 
activities, and responsibilities: willingness 
to accept socio-civic responsibilities; the 
faithful discharge of these responsibilities; 
and an attitude of tolerance toward the 
opinions and actions of others. 


The procedure followed in the collection of 
the data was, (1) the administration of a 
battery of tests designed to measure the gen- 
eral objectives of the year’s course to all par- 
os pupils at the beginning and end of 

the school year so as to obtain measures of 
pupil changes occurring over a six-month 
period; (2) the application of appropriate 
pupil measures just prior to the teaching of 
and immediately following the teaching of 
two three-week units in the field of citizen- 
ship—, one of these units to be taught in the 
fall of the year, the other in the spring of the 
same school year;* (3) the application of an 
intelligence test and a reading test to each 
pupil at the beginning of the year to be em- 
ployed in the equating of groups, and (4) the 
application of various measures to the teacher. 
The validation of the measures applied to the 


* These twp units were recommended in the State Course of 
om Al were titled as follows: Unit I, Safeguarding Public 
and Unit II, Community Planning. 


53 


teachers is one of the primary concerns of this 
investigation. 

Several weeks before the teaching of the 
first unit was begun each teacher was visited 
by the investigators, and final arrangements 
were made for the administration of the vari- 
ous pupil measures. Due to the number of 
teachers and the number of tests to be admin- 
istered, it was necessary to stagger the testing 
periods from school to school. The time 
schedule for each school, however, was care- 
fully controlled so that the periods between 
initial testing and final testing for the long- 
time changes was six months, and for, the 
units three weeks. The number and length of 
citizenship class periods per week were held 
constant for all teachers. 


Several weeks before the teaching of the 
first unit on public health was to begin, a 
letter was sent to each teacher giving the 
dates upon which the testing and teaching 
were to begin. A statement of the topics to be 
covered and the general objectives were also 
sent as a guide in teaching the unit.* The 
teacher was free to employ whatever means 
she deemed appropriate, that is, she was free 
to use any subject matter, or materials that 
she thought best in the attainment of the 
objectives. Before any teaching was begun two 
batteries of “over all” long time measures 
were applied to the pupils, namely, (1) three 
Wrightstone tests, and (2) three Hill tests.‘ 
Immediately prior to the teaching of unit one, 
the Health test designed to measure the objec- 
tives of this unit was administered to each 
class. Teaching on this unit then continued 
for 13 successive days and on the 15th day 
the same test given as a pretest was admin- 
istered as a final test. Following the teaching 
of this unit, which took place in October, the 
teacher resumed her normal course of study. 
In the spring of the same school year the 
teachers were informed of the exact dates for 
the teaching of the second unit on Community 
Planning. As in the first unit, the teachers 
were sent a list of topics and the general 
objectives to be followed. Shortly after the 
teaching of the second unit, the pupils were 
again given the two batteries of tests, namely, 


ix “B” of original thesis on file University of 
ibrary for instructions and announcements sent to 


® See 
Wisconsin 
the teachers. 


*The tests applied to both pupils and teachers will be 
described in the next section. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE I 


Sex, Ace, YEARS OF TEACHING EXPERIENCE, YEARS OF TRAINING BEYOND HIGH SCHOOL, 
MONTHLY SALARY FOR EACH PARTICIPATING TEACHER 


Teaching _ Beyond’ Monthly 
oa We) tim) tm) (Dolla) 


17 90 


90 
105 


ol a and _ 


— 


rf" 


esoesessesesessesssees 


100 
100 
100 
105 


_ 


2 
2 
7 
8 
3 
6 
3 
4 
2 
2 
9 
9 
3 
7 
8 
5 
1 
4 
8 
1 
1 
6 
4 
3 
4 
7 
7 
3 
6 


Pe be et ND et Be Dt et bt Dt bt bt 2D tt Dt 2D 2D PP Pt Pt 


(1) three Wrightstone tests, and (2) three 
Hill tests, which had been administered as 
initial tests in the fall.° 


DESCRIPTION OF THE TEACHERS PARTICI- 
PATING IN THIS INVESTIGATION 


This investigation started with 72 teachers. 
Fifteen of those who started the investigation 
were subsequently dropped because of incom- 
plete data, leaving a total of fifty-seven 
schools from which complete pupil and 
teacher data were obtained. 

The 57 teachers participating in this inves- 
tigation were teaching in either one-room or 
two-room rural schools in rural areas of Dane 
and Columbia counties. Table I summarizes 
the information concerning five factors, 
namely, sex, age, years of teaching experi- 
ence, years of training beyond high school, 
and monthly salary for each participating 


5 All final pupil tests were administered by the investigators 
with two or three exceptions where administrative difficulties 
ted ing this procedure. 


J, tests. 


a: 


= 
BS acSaueranseSten t 
B 
< 
i 
~— 
~ 
o> 
SRR 


Be DD DD DD DD bt 1D bt bt Ft 2D bt Dt tt tt tt 0 


— 
& OF01 CO ~1 CO CO Cn 


..28.26 8.05 1.46 91.75 

teacher. It will be noted that fifty of the 
teachers were women, seven were men. The 
range of chronological age extended from 20 


to 54 years inclusive, with a mean age of 


* 28.3 years; and a median of 26 years. 


The total years of teaching ranged from 1 
to 30 years with a mean of 8.05 years; and a 
median of 7 years: This was the first year of 
teaching for five teachers. 

All teachers had at least one year of train- 
ing beyond high school with the average being 
1.46 years. Thirty-two teachers had attended 
a county normal school for only one year. 
Three teachers were college graduates and 
one had received a Master’s degree. 

The monthly salaries ranged from sixty- 
five dollars to one hundred forty-five dollars, 
with an average monthly salary of ninety-one 
dollars and seventy-five cents. 

The typical teacher participating in this 
investigation could be described as a woman 
28.3 years old, having taught 8.05 years, re- 
ceived 1.46 years of professional training and 
earned $91.75 per month for the school year 
1937-38. 





September, 1945| THE MEASUREMENT OF TEACHING ABILITY 


TABLE II 
ScHOOL, GRADE, AND NUMBER OF PUPILS PARTICIPATING IN THIS STUDY 


No. of 
Pupils 
14 
6 
10 
15 


= 


School 


BCWOAID CS corre 


999 99 99 99 99 9 99 9 933932.42 0 
RP Bp Rp Re Re Re Re Re Re Gp Rp Re Re Re Rp Bp > Be Re Re 
0 G0 G0 G0 G0 60 GO GO GO Go G0 G0 60 G0 G0 G0 60 00 00 GO 
3399999994999 9999244 
RP RP Rp Re Re Re Re Re Rp Be Rp Re Rp Rp Bp Re Re Be RR 
0 00 G0 G0 G0 60 60 G0 GO GO G0 G0 GO G0 G0 00 G0 G0 00 CO 


GION Ot mm © OO DG) Ot = 3 OF Or  -1 OO Cr 


DESCRIPTION OF THE PUPILS PARTICIPATING 
tN Tuts INVESTIGATION 


There were 404 pupils in the 57 classes 
who participated in this investigation. The 57 
classes ranged in size from 4 to 18 with an 
average of seven pupils per class. The small 
average class size was due to the fact that all 
but seven schools were one-room rural schools. 
Table II furnishes information relative to 
schools, grades and number of pupils partici- 
pating in this investigation. 

For the particular area of the curriculum 
investigated by this study all schools followed 
the practice of combining the seventh and 
eighth grades. 


DESCRIPTION OF THE SCHOOLS PARTICIPATING 
IN TuHIs INVESTIGATION 


The schools participating in this investiga- 
tion include 50 one-room rural schools and 
seven two-room rural schools. A one-room 
rural school is a school having one classroom 
with one teacher who teaches all subjects in 
grades one through eight. A two-room rural 
school is one having two classrooms, one 
teacher teaching grades one through four, the 
other teaching grades five through eight. 

The typical school may be described as a 
one-room rural school with one teacher, a 
total enrollment of 21 pupils with seven pupils 
in grades seven and eight. The district per 
pupil valuation is $12,315.00 and the total 


Grade 


No. of 
Pupils 


No. of 


School Grade Pupils 


_ 
aon 
a 


99 ng 99 99 99 9 9 9 994444 

RP Rr RP Re Be Rp Re Re Rp Be Bp Re Re Bp Re Rp Re 

GO 00 GO 60 GO 60 GO 00 GO GO GO G0 00 G0 G0 GO Go 
_ 

COND COO DCN ANON AMAIARARAI 


Total Number o' 


TIP IAAAS 0 O-1h— 1-10 HR -10 
a 


Participating 
404 


per pupil expenditure for the school year is 
$65.00. Table III shows this information for 
each participating school. 


SECTION II 


DESCRIPTION OF TEACHER AND 
PUPIL MEASURES EMPLOYED IN 
THIS INVESTIGATION 


In the formulation of the plan of this in- 
vestigation a definite effort was made to 
include within its organization as many as 
possible of the desirable results of previous 
studies. Three significant conclusions drawn 
from previous investigations guided in the 
selection of measuring instruments: 


1. To measure successfully the character- 
istics of teaching ability it is necessary to 
measure a number of important aspects of 
teaching. 

2. To use desirable changes in pupils as a 
criterion of teaching success, it is n 
that measures of change other than that of 
academic achievement be included, —all 
changes in the pupil being important. 

3. Measures of teaching ability must be as 
valid, reliable, and objective as possible. 


TEACHER MEASURES 


The teacher measures employed were in the 
main the same as those employed in Rostker’s 
investigation. The reader is referred to Rost- 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE III 
INFORMATION ABOUT THE SCHOOLS PARTICIPATING IN THIS STUDY 


Per 
Total Pupil 
School School 
Enroll- Valua- 
tion 
$5,863 
22 ,308 
13 ,650 
6 ,023 
6,565 
7,576 
9 ,320 


Type of 
School* 
2-room 
1-room 
1-room 
2-room 
1-room 
1-room 
i1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
l-room 
1-room 
1-room 
2-room 
1-room 
1-room 
2-room 
1-room 
1-room 


CHOARr wONMre 


Per 
Pupil 
School 
Valua- 

tion 
$8 ,833 

8,333 
26,579 

6,378 

6,676 

15,714 

8,400 

12,111 


Total 
School 


Per 
Pupil 
School 
Cost 


Type of 
School* 
1-room 
1-room 
1-room 
1-room 
2-room 
1-room 
1-room 
1-room 
1-room 
l-room 
1-room 
1-room 
1-toom 
1-room 
2-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
2-room 
1-room 


4,500 
16 ,625 
17 ,733 
20 ,667 
13 ,706 

9,474 

5,320 

7,111 


6 ,227 
12,315 


Ave. 
Ave. 


2-room 


1-room 21 65 


*A one room school is to be interpreted as a school having one room and one teacher who teaches 


grades I through VIII. A two-room school is to be interpre 
teachers, one teacher teaching grades I-IV, and the other teacher teaching grades V-VIII. 


ker’s study (pages 15-20) for a description 
of these measures. Four measures were added, 
namely, 


1. A scale for evaluating the personal fit- 


mess of teachers—(Mimeographed, Depart- 
ment of Education, University of Wisconsin, 


Madison, Wisconsin, 1937.) This teacher 
rating scale consists of 33 teacher traits such 
as accuracy, health, loyalty, sociability, thrift, 
etc., each of which is rated on an eleven point 
scale (o-10). If the rater could think of no 
teacher, teaching at the same grade level, who 
was better with respect to a given trait, he 
was instructed to check the “o” value. If the 
rater could think of one teacher who was 
better, the rating was to be “1”, and so on. 
As the rater recalled increasing numbers of 
teachers who exceed the teacher being rated 
on any’given trait other values on the scale 


as a school having two rooms and two 


were to be checked. The score a teacher re- 
ceives is the arithmetic average of the num- 
bers checked by the rater for the 33 traits 
enumerated. Low scores are desirable since 
they indicate that the rater could on the aver- 
age think of relatively few teachers who sur- 
passed the teacher being rated. High scores 
on the other hand indicate that the rater 
could readily think of numbers of teachers 
who in his judgment were better with respect 
to these traits. 

Three raters rated each teacher on this 
scale, namely, the county superintendent of 
schools, the county supervisor, and the inves- 
tigator. The three resultant ratings were aver- 
aged to obtain the index that was used in 
subsequent analysis of this instrument. 

2. A personality rating scale—(Mimeo- 
graphed, Department of Education, Univer- 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 57 


sity of Wisconsin, Madison, Wisconsin, 1937.) 
In this scale the rater was asked to assign 
one of eleven intensity values to each of 
six terms which are descriptive of the total 
personality effect the teacher has upon others. 
The six terms upon which teachers were to 
be rated are: pleasing, forceful, wholesome, 
interesting, stimulating, and confidence- 
inspiring. For each term, the rater checked 
either 0, I, 2,... Or 10 depending upon 
whether he could think of none, one, two, 
etc., or ten persons who were superior to the 
teacher being rated. The total score was the 
arithmetic average of the six numbers assigned 
by the rater and represents the number of 
persons he recalled who, in his judgment, 
were superior in total personality effect to the 
teacher rated. From this point of view low 
scores are desirable. The score on this scale 
used in subsequent discussions is the average 
of the ratings assigned to each teacher by the 
three raters as described above. 


3. A test of teacher-pupil relationship — 
(Department of Education, University of 
Wisconsin, Madison, Wisconsin, 1938. Mime- 
ographed form used by special permission of 
the authors, T. L. Torgerson; Bjarne Ulisvik; 
and Lawrence Wahlstrom.) This test is de- 
signed to measure a teacher’s understanding 
of mental hygiene principles and their appli- 
cation to specific teacher and pupil situations. 
It consists of six parts. Part I is a measure 
of a teacher’s understanding of symptoms and 
causes of pupil maladjustments. Part II is 
used to appraise classroom situations. Twenty- 
eight behavior traits found in a typical class- 
room are listed. The teacher is to indicate on 
a four-point scale the frequency of occurrence 
of each behavior trait in her classroom. Part 
III tests the teacher’s power to evaluate 
behavior traits. Part IV is a measure of the 
teacher’s ability to successfully apply prin- 
ciples of mental hygiene. Twenty descriptions 
of child behavior common to the ordinary 
classroom are listed. From a master list of 
37 procedures for caring for behavior situa- 
tions, a teacher selects the ones she deter- 
mines to be applicable. Part V sets forth 
eleven problem situations that frequently 
arise in the classroom. From among 25 pro- 
cedures, the teacher selects those which are 
most desirable for correcting the particular 
situation. Part VI is designed to secure a 
measure of each teacher’s classroom practices. 
Forty-eight common classroom practices are 


listed. The teacher indicates whether she uses 
each practice always, usually, sometimes, 
rarely or never. 

4. The Sims score card for socio-economic 
status, Form C.—(Public School Publishing 
Company, Bloomington, Illinois, 1927.) The 
purpose of this Score Card is to provide a 
simple, convenient, and objective device for 
ascertaining and recording the genera] cul- 
tural, social, and economic background for 
those to whom it is applied. It may be used for 
determining the socio-economic status of any 
social group. It was used in this investigation 
for the purpose of obtaining numerical ratings 
which would permit a statistical study of 
socio-economic status as a factor in teaching 
efficiency. Home conditions need no longer be 
recorded as good, average, or poor, but may 
be given a numerical rating that is far more 
precise than the usual verbal characterizations. 

The specific area measured by a certain 
test is often difficult to determine. Most in- 
telligence examinations contain a share of 
material that measures information. Rating 
scales possess items that refer to many differ- 
ent characteristics of teachers. However, the 
eighteen measures used in this study may be 
grouped as follows: 


Intelligence: 
American Council Psychological Examina- 


tion. 

Teachers College Psychological Examina- 
tion. 

Knowledge of subject matter: 

American Council Civics and Government 
Test. 

Hartmann—Public Problems Information 
Test. 


Personality: 
The Bernreuter Personality Inventory: Bn, 
Bd, and Bs. 
Morris Trait Index—L. 
Washburne—Social Adjustment Inventory. 
Barr and Others—A Scale for Evaluating 
the Personal Fitness of Teachers. 
Barr and Others—Personality Rating Scale. 
Teacher Rating Scales: 
Torgerson Diagnostic Teacher Rating Scale 
of Instructional] Activities. 
Almy-Sorenson Rating Scale for Teachers. 
The Michigan Teacher Rating Scale. 
A ptitudes : 
Stanford Educational Aptitudes Test. 





58 JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE IV 
Puri. Tests: MEANS AND STANDARD DEVIATIONS FOR THE TOTAL POPULATION OF 404 Pupiis 


Final Means 
Pre-test 


94.93 
40.49 
89. 58 


8.12 
12. 23 
9.27 


71.48 
92.04 


Chronological Age: --_- 


Attitudes: 
Hartmann—Social Attitudes of Secondary 
School Teachers. 
Wrightstone—Scale of Civic Beliefs. 
Lewerenz-—Steinmetz — Orientation Test 
Concerning Fundamental Aims of Edu- 
cation. 


Yeager—Scale for Measuring Attitudes 
Toward Teachers and Teaching. - 


Torgerson—A Test of Teacher-Pupil Rela- 
tionship. 


Socio-Economic Status: 
Sims Score Card for Socio-Economic Status. 


The pupil tests were the same as those 
employed by Rostker, except that the two unit 
tests were revisions of those used by Rostker. 
The Sims Score Card was not used with pupils 
as in Rostker’s study. 


Final Standard Deviations 
Test Pre-test 


13. 43 12. 54 
12.34 12.61 
16. 82 21.96 


-48 3.08 
.19 3.02 
-10 2.98 


12. 56 
14. 04 


N Means _ 5S. D. 


155.97 10.92 
157.75 15.738 
101.75 12.82 

73.91 23.62 


SECTION III 


THE DEVELOPMENT OF THE 
CRITERION OF TEACHING 
ABILITY 


The criterion of teaching ability employed 
in this investigation was that of measurable 
changes produced in pupils. In general the 
same tests, except as already noted, and the 
same procedures were employed in develop- 
ing the criterion in this study as in Rostker’s 
study. 


The means and standard deviations for the 
final, initial, and change scores for each of 
the eight pupil tests were calculated. These 
measures with the means, standard deviations, 
and the number of pupils in each class for 
the 404 pupils in this study are listed in 
Table IV. Nine separate criteria of teaching 
ability were calculated for each teacher, one 


TABLE V 
INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF FINAL STATUS 
N= 404 


W; H) H2 
. 565 . 573 
. 568 . 549 
. 457 
.427 


minmsdssicd'y 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE VI 
INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF INITIAL STATUS 


N= 


for each pupil gain measure, and one from a 
combination of the eight measures taken 
together. 

In order to derive the composite score it 
was necessary to consider the intercorrelations 
of the tests. The intercorrelations of the eight 
final test scores, the eight initial test scores, 
and the eight change scores were calculated 
with the results reported in Tables V, VI, and 
VII. A test was made to determine how large 
the intercorrelations must be in order to be 
significantly different from zero. Correlations 
larger than .10 were taken to indicate the 
presence of a real relationship with consider- 
able assurance. 

To obtain the UWH composite pupil gain, 
the mean pupil gains (Table A,,)* for the 

* See original thesis on file University of Wisconsin Library. 


404 
Hi 


438 


Hi 
—.001 
. 046 


eight tests were divided by the appropriate 
pooled sigmas and then composited according 
to their reliabilities in a 1:1 

ratio with the heaviest weights to the three 
Wrightstone tests. To secure a single yard- 
stick for the initial and final tests the devia- 
tion scores were divided by a pooled sigma 
developed from the initial and final sigmas 
according to the following formula: 


4 ee + o* Final 
Tpooled == 2 


The initial, final, and pooled sigmas are 
reported in Table VIII. 

The resultant weighted average became: 
UWH composite gain = .082U,, + .073U 2. 
+ .357Wieg + .539Wee + -564Wy + 
.322H yg + .304H og + .320H gg. 








TABLE VIII 
THE INITIAL, FINAL, AND PooOLED SIGMAS FOR THE EIGHT PuPiIL MEASURES 


Pupil Test 


Community Planning ----__- 
se SSS 
Civic Beliefs 


Standard Deviations 
Final 
12. 56 
14. 04 
16. 82 
13. 43 
12.34 
3.19 
3.48 
3.10 


Initial 
11. 84 © 
13.44 = ‘ 

21. 96 ' 
12. 54 

12.61 © 
3.02 





JOURNAL OF EXPERIMENTAL EDUCATION 


The average UWH composite pupil gain, 
increased by 10, is listed for each class in 
column (2) of Table IX. The constant 10 
was added throughout merely for the con- 
venience of eliminating minus signs. 

The eight separate measures of pupil change 
were thus reduced to the U. W. and H. com- 
posites and represent the average progress 
made by pupils of the 47 classes toward the 


[Vol. 14, No.1 


attainment of the educational goals set forth 
as measured by the eight tests. These three 
composites represent three separate criteria of 
teaching ability. 

One of the major purposes for making the 
composite of the eight tests into three sets, 
U. W. and H. was to increase the reliabilities 
of the tests, particularly as measures of 
change. The next step was to calculate the 


TABLE IX 
OBSERVED, PREDICTED, AND RESIDUAL PuPIL GAINS FOR THE 47 CLASSES 


Class and 


Observed UWH 
Teacher 
Number 


comp. pupil 


3. 66 
25. 48 
26. 
13. 


Predicted 
UWH comp. 
gain pupil gain 


Residual pupil 


= 
PGTA (37-38) 
—8.94 


PGTA (37-38) 
Rank of 


Teacher 


ESEASRSASSASSSESSSAESES 


P= 
nr 


BSESSARSSESa 


3. 
—4. 
—11. 
—. 
RS. 
—4. 
—65. 
2. 

4. 

9. 
—4. 
* 
—2. 
5. 
—9. 
—4. 
4. 
5. 

; 
—T. 
—. 
Be 
5. 
—4. 
5. 

3. 

A. 
—l. 
2. 
2. 

4. 
—4. 
4. 
—-. 
8. 

i. 
—4. 
2. 
—T. 


22532883 2..9.8 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE X 


F AND CHI-SQUARE VALUES USED TO DETERMINE HOMOGENEITY OF PUPIL 
Factors For 57 CLAsses”™ 


Pupil Factors 


.Q 
Readin 
Unit Senmneiiee Initial Score 
Wrightstone | Composite Initial Score 
Hill Composite Initial Score 


F Chi-Square 
1. 067 90. 305 


2.247 36. 376 





5% Value 
1% Value 


reliabilities of the final, initial, and change 
composites. This was done according to a 
formula taken from Kelley.’ These reliabilities 
are given below: 


RELIABILITIES OF COMPOSITES OF PUPIL 
Test Scores 


Final Initial 


. T12 
. 901 


- 824 
. 928 


Change 
. 464 


. 690 
. 187 
. 684 


Unit Composite 
ge Com- 


tWwH =. 


These composite reliabilities are in most 
instances substantially higher than the indi- 
vidual test reliabilities. The reliabilities of the 
change scores are still substantially lower than 
were those of the final and initial scores. To 
further stabilize these change scores a single 
composite score was developed from the U, 
W and H composites. 

In order, however, to make comparisons of 
the changes purportedly produced by each of 
the 57 teachers with her group of pupils it 
was necessary that the groups be homogeneous. 
Tests of homogeneity were made with respect 
to M.A., 1.Q., Reading, U composite initial 
test score, W composite initial test score, H 
composite initial test score and UWH com- 
posite initial test score. See Table A (Appendix 
A)* lists the means and standard deviations 
for pupil C.A., M.A., L.Q., and Reading, for 
each class and for the total population. To 
test the homogeneity of the 57 groups the F 
test as described by Snedecor® was employed. 


™T. H. Kelley, Statistical Meshes (New York: MacMillan 
“re 1923), p.. "1 io =! 
original 


Press, ia) 182 189 189 ‘Section I ry 
te” bal 
cole % level i) pe. 


a Ames 
ment 
oles uae Sel times os coe deal Ya 
conditions each time, a difference as  ¥- «x tenet fone 
one observed would occur due to chance actor omy five tims 
out of this bondved. Whether one accepts a 5% level 
1% level ap signiacant ‘is in the inst ‘analysis « matter” of 
The level is commonly used in statistical 


University ¢ Wisconsin. 
Uni 


1.38 
1. 57 


. 196 


If the value of F is larger than the 5% or 
1% value the hypothesis that the means of 
the 57 classes are homogeneous is invalid. 

The Chi-square test*® was employed to test 
the homogeneity of class variances. The 
values of F and Chi-square obtained from 
these tests as well as the corresponding 5% 
and 1% values necessary to determine 
whether the calculated values refute the 
hypothesis of homogeneity or not are listed 
in Table X on the following page. 

An inspection of Table X reveals that the 
values of F, with the exception of M.A., are 
much higher than the 5% value which indi- 
cates that the means of the 57 classes relative 
to these factors are too discrepant to be con- 
sidered homogeneous. It was further observed 
that the Chi-Square values, with the excep- 
tion of Unit Composite Initial score and 
Wrightstone Composite Initial score are 
higher than the 5% value which indicates 
that the classes are not homogeneous with 
respect to their variances on these factors. 

This variation of pupil factors has been 
observed in most researches of this sort and 
requires the elimination of a number of pupils 
to arrive at homogeneous groups. It was nec- 
essary in this investigation to arrive at as 
many homogeneous groups as possible with 
respect to class means of M.A., L.Q., Read- 
ing, Unit Composite Initial score, Wrightstone 
Composite Initial score, Hill Composite Ini- 
tial score and the UWH composite. By trial 
and error those classes having extreme vari- 
ances on the pupil factors considered were 
eliminated. Ten classes were finally deleted 
in order that the remaining classes might rep- 


” P. R. Rider, An Introduction to Modern istical Method 
(New York: John Wiley and Sons, 1939), pp. 102-103. 
mR. A. Fisher, Statistical Methods for phan Workers 
(London: Oliver and Boyd, 1934), p. 62. Since Chi-Square 
Tables do not go beyond N =’ 30, Fisher uses the test 
Vi2 — vs i which 
“2” for 5% and 1% 
of ‘‘t” may be either plus or minus. 


ves a “‘t” value and the value of 
read from Fisher’s Tables. Values 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE XI 
F AND CHI-SQUARE VALUES USED TO DETERMINE HOMOGENEITY OF PUPIL 
7 CLASSES 


Pupil Factors 


Unit Composite Initial Score 
Wrightstone Composite Initial Score 
awa Composite Initial Score 


F 
1. 087 
1. 832 
2.016 
4.007 
4.013 
2. 230 
3. 388 


Chi-Square 
64. 606 
34. 754 
64. 849 
37.209 





5% Value 
1% Value 


resent a group whose variances were homo- 
geneous in the several pupil factors. These 
classes were numbered 1, 5, 10, 13, 24, 39, 
49, 52, 54, and 55. All subsequent calcula- 
tions were based upon the remaining 47 
classes with a total of 338 pupils. Table XI 
lists the Chi-square and F test values for test- 
ing the homogeneity of the means and stand- 
ard deviations of the remaining 47 classes 
together with the 5% and 1% values which 
are necessarily based upon somewhat different 
degrees of freedom than for the previous 57 
classes. 

Inspection of Table XI reveals that all 
classes are consistent with the hypothesis of 
homogeneous variances. Only the M.A. factor 
is within the 1% limit for the “F” value. 


1.40 
1. 57 


The UWH composite of the mean pupil 
changes in the eight pupil measures for these 
47 Classes was employed in the further devel- 
opment of the criterion. These mean pupil 
changes, however, are assumed to be a func- 
tion of three factors, (1) the efforts of the 
teacher, (2) the abilities of pupils, and (3) 
other factors not measured. It was assumed 
that the 47 classes were homogeneous on all 
factors not measured in this investigation. 
Any differences in mean pupil changes among 
the 47 classes may then be said to be a func- 
tion of the efforts of the teacher and pupil 
abilities plus constant factors. 

To secure an index of the teacher’s efforts 
with the effect of pupil abilities such as M.A., 
1.Q., Reading, and previous knowledge of the 


TABLE XIi 


INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF INITIAL STATUS WITH PUPIL 
TESTS AS MEASURES OF CHANGE 


N = 338 
Initial Status 


U2 
—.010 
—.378 
—. 030 
—. 028 

.012 
. 071 
.013 
. 038 


U;—Unit I—Health Test. 
U:—Unit II—Community 


Wi 

. 135 

. 005 
—. 671 
. 003 
—. 105 
.078 

. 062 

. 026 


“012 


. 059 
. 032 
. 006 
—. 415 
. 057 
. 003 
. 067 
. 073 


W; W; 
.O11 
. 035 
—. 058 
.071 
—. 430 
.118 
. 091 


040 —.015 


Planning. 
W:1—Wrightstone—Ability to Organize Research Material. 


W:—Wrightstone—Scale of Civic Beliefs. 
W:—Wrightstone—Ability to Generalize. 
H;—Hill Information. 

H;—Hill Attitude. 

H;—Hill Action. 

U,—Unit Composite. 


posite. 
Tie—Correlation between initial status and change. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 63 


subject held constant, a series of intercorrela- 
tions were calculated. The intercorrelations 
of pupil tests as measures of initial status and 
as measures of change are given in Table XII. 
The correlations of 1.Q., M.A., and Reading 
with the pupil tests as measures of pupil 
change are given in Tables XIII, XIV, and 
XV. 

In developing the regression equation for 
partialing out the effect of variable pupil 
abilities between classes the intercorrelations 


TABLE XIII 
N= 404 


CORRELATIONS OF I.Q., M. A. AND READING 
Witu Pupit Tests AS MEASURES 
OF INITIAL STATUS 


1.Q. 
. 451 
- 476 
. 823 
. 385 
. 407 
. 377 
. 385 
. 304 


Reading 
. 571 
.444 
. 345 
. 807 
. 480 
. 580 
.479 
. 499 


M.A. 


TABLE XIV 


CORRELATIONS OF I.Q., M. A. AND READING 
WiTH Puri Tests AS MEASURES 
oF FINAL STATUS 


N = 404 


M.A. Reading 
. 625 . 558 
. 615 . 527 
. 543 . 515 
. 372 . 409 
. 495 . 584 
. 547 .518 
. 555 . 529 
.488 . 545 


CORRELATIONS OF I1.Q., M. A. AND READING 
WirH PupiL Tests AS MEASURES 
OF CHANGE 


N= 404 


1.Q. 
. 039 
—. 008 
. 049 
—. 005 
011 
. 087 
. 089 
. 045 


M.A. 


of each pupil factor with the mean pupil 
change were then calculated. These are given 
in Table XVI. 

The beta coefficients for the regression 
equation for predicting UWH composite gain 
from M.A., L.Q., Reading, and UWH com- 
posite pre-scores were obtained from the 
above correlation table by Aitken’s method of 
pivotal condensation.** The resultant means, 
sigmas, beta coefficients and regression coeffi- 
cients are given in Table XVII. 

The constant term in the regression equa- 
tion is equal to —47.35. The regression equa- 
tion is written thus: 


X, = 613X, — .497X2 + .254X, — 
511X, — 47.35. 
where X, = predicted average UWH com- 
posite change. 
X, = M.A. class mean. 
X, = LQ. class mean. 
X, = Reading class mean. 
X,= UWH composite initial status 
class mean. 


A predicted average pupil change was cal- 
culated for each of the 47 classes by means 
of this prediction equation. These are listed 
in column 3 of Table IX. 

That portion of the observed pupil change 
UWH composite, C,, less the predicted pupil 
change, C,, gave an amount, C,, thought to be 
due to the influence of the teacher; that is 
C, — C, = C,. Table [X"* lists the values of 
C., C,, and C, for each of the 47 classes. 
Thus column 4 in Table IX lists an index of 
teaching ability for each of the 47 teachers. 


SECTION IV 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES 


The criterion developed in the previous 
section represents an objective measure of 
teaching ability. It is the purpose now to 
determine the statistical validity of certain 
teacher measures by studying their relation 
to the established criterion. 

The list of teacher measures studied fo)- 
lows:** . 


™G. H. The Factorial Analysis of Humen 
Abiity (New York: Houghton Mifflin Company, 1939), pp. 


18 See Table A,,, original thesis on file, Library, University 
of Wisconsin. 


“ For a description of these measures see pp. 15-20. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.t 


TABLE XVI 
CORRELATIONS EMPLOYED IN THE PREDICTION OF PUPIL Factors In UWH ComposiITE CHANG 


4 
- 604 
. 566 
. 320 
1.000 
—. 470 


TABLE XVII 
MEANS, SIGMAS, BETA AND REGRESSION COEFFICIENTS 


American Council on Education Psycho- 

logical Examination for College Fresh- 

men—1936 Edition. 

Psychological Examination Prepared for 

the Teachers College Personnel Associa- 

tion—Form C—1938 Edition. 

American Council Civics and Government 

Test—Form B for High Schools and 

Colleges. 

Hartmann—Public Problems Information 

Test. 

Bernreuter—Personality Inventory, Bn. 

Bernreuter—Personality Inventory, Bd. 

Bernreuter—Personality Inventory, Bs. 

Morris Trait Index “L”. 

Washburne—Social Adjustment Inven- 

tory—Sapich Edition. 

A Scale for Evaluating the Personal Fit- 

ness of Teachers, unpublished material, 

University of Wisconsin. 

Personality * Rating Scale, unpublished 

materials, University of Wisconsin. 

Torgerson Diagnostic Teacher Rating 

Scale of Instructional Activities. 

T,, Almy-Sorenson Rating Scale for 
Teachers. 

T,, Michigan Teacher Rating Scale. 

T,; Stanford Educational Aptitudes Test, 
T-A 


Sass 


he 
i 


rh 


Ty Stanford Educational Aptitudes Test, 
A-R. 

T,, Stanford Educational Aptitudes Test, 
T-R 


T,, Hartmann—Social Attitudes of Teachers. 


Means Means 


Sigmas of Beta Regression 
Coefficients . Coefficients 


7.30 .617 . 613 
5.99 —.411 

11. 68 
10. 53 

7.25 


T,, Wrightstone — Scale of Civic Beliefs, 
Forms A and B combined. 

T.. Lewerenz—Steinmetz — Orientation Test, 
1935 Revision. 

T., Yeager—Scale for Measuring Attitudes 
Toward Teaching and the Teaching Pro- 
fession. 

T,. A Test of Teacher—Pupil Relationship by 
Torgerson, Ullsvik, and Wahlstrom. 

T.,; Sims Score Card for Socio-Economic 
Status, Form C. 


Additional data were obtained as follows: 
T,, Age of the teacher. 
T., Teaching experience—in years. 
T., Professional training in years above high 
school. 
T., Size of school—number of pupils. 
T., Size of class—number of pupils. 
T., Per-Pupil cost per year. 
T;, Salary of teacher per month. 
C, Criterion Score based upon pupil change. 


Table A,, Appendix A,’* reports the raw 
scores for the thirty measures listed, the cri- 
terion scores for all teachers, and the means 
and standard deviations for each for the group 
of 47 teachers participating in the investiga- 
tion. The correlations of the teacher measures 
with the criterion are given in Tables XVII 
and XIX. The intercorrelations among the 
several teacher measures are reported in 
Table XX. 
fn original thesis on file, Library, University of Wis 


Septem: 


en ae ee rT TS. a 2 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TaBLe XVIII 


—* oF TEACHER MEASURES 
. WiTH CRITERION 


N= 47 


erican Council Psychological Test ..—. 10 
—— College chological Test... .05 
— Council Civics and Govern- 


Hartmann Information Test 

Bernreuter Personality Inventory Bn. .—. 14 
Bernreuter Personality Inventory Bd.. .04 
Bernreuter Personality Inventory Bs _.—.11 
Morris Trait Index L —.17 
— Social Adjustment In- 


Scale aoe ersonal Fitness of Teachers.. .35 
Personality Rating Scale—U. of Wis... .30 
Torgerson Diagnostic Rating Scale - 

or ‘taraname Rating Scale for Teach- 


Michigan Rating Scale 

Stanford Aptitudes Test T-A 

Stanford Aptitudes Test A-R 

Stanford Aptitudes Test T-R 

Hartmann Social Attitudes Test 

Wrightstone Scale of Civic Beliefs 

Orientation Test, 1935 Edition 

Yeager Scale for Measuring Attitudes 
Toward Teachers and Teaching 

Torgerson Teacher-Pupil Relationship 

Sims Socio-Economic Status 


Additional Data 


Age of the Teacher 

Teaching E ience in Years 
Professional Training above High 
School 


Per Pupil Cost Per Year_ 
Salary of the Teacher Per ‘Month 


The correlations between the various 
teacher measures and the criterion may be 
more easily interpreted when grouped under 
the following categories: 


A. Intelligence: 


1. American Council on Education Psy- 
chological Examination. 

2. Psychological Examination Prepared 
for the Teachers College Personnel 
Association. 

B. Knowledge of Subject Matter: 

1. American Council Civics and Gov- 

ernment Test. 


2. Hartmann — Information Test on 
Public Problems. 


c. 


D. 


65 


Personality: 
1. Bernreuter Personality Inventory, 
Bn. 


. Bernreuter Personality Inventory, 
Bd. 


. Bernreuter Personality Inventory, 
Bs. 


. Morris Trait Index “L”’. 
. Washburne Social Adjustment In- 
ventory. 


6. A Scale for Evaluating the Personal 
Fitness of Teachers. 


7. A Personality Rating Scale. 


Rating Scales for Teachers: 


1. Torgerson Diagnostic Rating Scale 
of Instructional Activities. 


. Almy-Sorenson Rating Scale for 
Teachers. 


. Michigan Rating Scale. 


TABLE XIX 


INTERCORRELATIONS OF TEACHER MEASURES 


2. 3 
2. M 
3. 
4. 
5. 
6 
7 
8 
9 
0 
1 


WITH CRITERION, ARRANGED IN 
oF SIZE 


rson Teacher Rating Scale 
chigan Rating Scal 

Serle Social Attitudes Test 

Almy-Sorenson Teacher Rating Scale 

Test of Personal Fitness—Charters 


. Size of School—Number of Pupils 

. Personality Test—U. of Wis 

. Wrightstone Scale of Civic Beliefs 

. Salary of Teacher—per month 

. Torgerson Teacher-Pupil Relationship_- 
. Yeager Scale for Measuring Attitudes 


Toward Teachers and the Teaching 
Profession 


. Experience of the Teacher—in Years___- 
. Size of the Class—Number of pupils 

. Stanford Aptitudes Test T-A 

. Washburne Social Adjustment “good 
- Teachers College Psychological E 


. Bernreuter Personality Inventory Bd - .- 
. Age of the Teacher 

. Hartmann Information Test 

. American Council Civics and Govern- 


. Per Saal Cost —.06 
. Training of Teacher Above High School _—. 09 
. American Council Psychological Exam- 


0 
. Bernreuter Personality Inventory Bs___.—. ‘li 


. Stanford Aptitudes Test T-R 


. Bernreuter Personality Inventory Bn__._—. 14 
. Stanford Aptitudes Test A-R 
. Sims Socio-Economic Status 
- Morris Trait Index L 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No. 


TABLE XX 
INTERCORRELATIONS OF TEACHER TEST ScorEs 
N = 47 
Test 


American Council Psychological Exam. ‘ 1. 000 
Teachers College Psychological q . 127 
American Coun. Civics ‘ ; 
Hartmann Information . . 123 
Bernreuter Bn 


Personality 

Torgerson Ratin Scale 
Almy-Sorenson 

Michigan Rating 
Stanford T-A 

Stanford A-R 

Stanford T-R 

Hartmann Attitudes 
Wrightstone Civic Beliefs 
Orientation Test 


Idd) )II00) 


eager 
Torgerson Teacher-Pupil Relationship - 
Sims Socio-Economic 
Age of the Teacher 
Experience of Teacher 
Training Above H.S 


of Teacher 
Criterion 


9 


© 90 IS om OPO 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE XX—Continued 


17 18 
—. 008 . 829 
. —093 - 481 

. 024 . 234 
—. 212 811 
—.104 . 285 

271 . 261 
—. 054 ; 

. 261 
—. 044 
—. 087 

. 002 


—. 058 
—. 128 : 285 


TABLE XX—Continued 


26 27 28 
. 323 - 065 —. 005 
. 489 . 274 . 069 
‘ . 219 - 197 
. 309 .271 
—.314 —.300 
. 480 
. 232 
—. 026 
.179 





68 JOURNAL OF EXPERIMENTAL EDUCATION 


E. Aptitudes and Attitudes: 
1. Stanford Educational Aptitudes Test, 
T-A. 
2. Stanford Educational Aptitudes Test, 
A-R. 

. Stanford Educational Aptitudes Test, 
T-R. 

. Hartmann — Social Attitudes of 
Teachers. 

. Wrightstone Scale of Civic Beliefs. 

. Lewerenz—Steinmetz— Orientation 
Test, 1935 Revision. 

. Yeager Scale for Measuring Attitude 
Toward Teachers and the Teaching 
Profession. 

8. Torgerson et al—aA Test of Teacher- 
Pupil Relationship. 
F. Socio-Economic Status: 
1. Sims Score Card for Socio-Economic 
Status, Form C. 


A study of the criterion correlations shows 
that only seven are statistically significant. Of 
these seven measures, five are rating scales. 
The other two measures are: Hartmann Social 
Attitudes Test, .384; and size of school 
(number of pupils), .312. The Wrightstone 
Scale of Civic Beliefs, .285; salary of teach- 
ers, .223; Torgerson Teacher—Pupil Relation- 
ship Tests, .222; and Yeager Scale for Meas- 
uring Attitudes Toward Teachers and the 
Teaching Profession, .221 approach signifi- 
cance. Taken separately none of these meas- 
ures have any very great validity when 
checked against the criterion here employed. 
In general the correlations reported for this 
study are lower than those reported by 
Rostker’s’*. This may arise in part from the 

Leon E. Rete, Memvonent and mg RY be 4 
Thcary, Usivenity of Wiscoasa, Madice, Wiacesi. 


[Vol. 14,No.1 


fact that a larger amount of the pupil gain 
was attributable to the teachers and other un. 
controlled factors in Rostker’s study. Forty. 
four per cent of the pupil gain variance js 
attributable to the teacher factor in Rostker’s 
study and only 24% to the teacher factor in 
Rolfe’s study; 27% of the variance is attri- 
butable to pupil factors in Rostker’s study 
and 48% in Rolfe’s study. Thirty per cent of 
the variance is attributable to errors of meas. 
urement in Rostker’s study and 29% in 
Rolfe’s study. It will be recalled that Rolfe’s 
teachers were principally teachers in one-room 
rural schools; those in Rostker’s study were 
in part from two-room state graded schools 
and presumably better teachers. 


SECTION V 


A STUDY OF THE VALIDITY OF COM. 
BINATIONS OF SEVERAL TESTS 


The correlations between teacher measures 
and the criterion have already been presented 
in Tables XVIII and XIX. The purpose of 
this section of this report is to present data 
relative to the predictive value of certain 
combinations of tests. The first combination 
to be studied is presented in Table XXI. 

By using Aitken’s Method of Pivotal Con- 
densation** the beta coefficients necessary for 
computing the multiple correlations and mul- 
tiple regression equations were obtained. 
Table XXII shows the results of these com- 
putations based upon the nine teacher 
measures selected for study. 

The multiple correlations and their signif- 
cance are indicated in Table XXIII. These 
data were obtained from the proper multipli- 


™ Godfrey H. Tho , The Factorial Analysis of Humes 
rows (New York: ton Mifflin Company, 1939), pp. 


TABLE XXI 
RELATION BETWEEN CERTAIN MEASURES APPLIED TO TEACHERS AND TEACHING EFFICIENCY 
N = 47 


Torgerson—Diagnostic Teacher Rating Scale 
Hartmann—Social Attitudes of Teachers 
Wrightstone—Scale of Civic Beliefs 
Torgerson—Teacher-Pupil Relationship 


Yeager—Scale for Measuring Attitudes Toward Teachers and the Teaching Profession - - 


Morris—Trait Index L 


a4 bd bd AA Bd Ad A AA 


bd bd bd bd bt bd be bb | 


pS eae Bees 5 ne Inventory Bn 


American Council Psychological Examination 
American Council Civics and Government Test 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TaBLe XXII 
BETA COEFFICIENTS FOR PREDICTION OF TEACHING SUCCESS 


MULTIPLE CORRELATIONS AND THEIR SIGNIFICANCE 


R’o,1...0 Ro,s...0 1-R* 


-425 .819 
.498 .752 
.526 .723 
.533  .716 
577 =. 667 
.598  .642 
.603 .636 
627 .607 
.627 .606 


i=] 
- 
io) 


2. 


WoOADF WNHre 
PO POLO PO POPOL com 
et dad no 
SSREESELS 


*Significant at 5% level; 5 chances in 100 from an uncorrelated random pop. 
**Significant at 1% level; 1 chance in 100 from an uncorrelated random pop. 


cations of the correlations and beta coeffi- 
cients in Table XXII. 

Two important facts stand out from Table 
XXIII: (1) With the addition of more vari- 
ables the multiple correlation increases in size; 
and (2) as more variables are added the error 
increases. The increase in the multiple R’s 
and the errors move along fairly parallel 
courses. When the oth variable is added the 
R is no longer significant at the 1% level. 

From the relative contribution each inde- 
pendent variable makes to the next multiple 
correlation, it is possible to rank these vari- 
ables according to their potency. Therefore, 
in Table XXIII, the second column ranks 
these independent variables according to their 
contributions as follows: 


1. X, Torgerson Teacher Rating Scale. 

2. X, Hartmann Social Attitudes of 
Teachers. 

3. X, Yeager Scale for Measuring Atti- 
tudes Toward Teachers and the 
Teaching Profession. 

4. X, Wrightstone Scale of Civic Beliefs. 

5. X, American Council Psychological Ex- 


6. X, Morris Trait Index L—Leadership. 
7. X, Torgerson Teacher-Pupil Relation- 
ship 


8. X, Bernreuter Personality Inventory, 
Bn. 


9g. X, American Council Civics and Gov- 
ernment Test. 


It is understood that contributions to the 
multiple correlations will be in relation to 
those contributions made by previous tests. 
The reason for such a small increase in the 
multiple correlation for any particular meas- 
ure will be in terms of the contributions 
already made. Tests which contribute similar 
content may be discovered and use made of 
those items which add to the test results. 
Tests number 4, 7, and 9 appear to contribute 
little to the multiple correlation of .627 in 
the list above. 

Since the beta coefficients tell us the rela- 
tive importance of the contributions made by 
the independent variables in multiple correla- 
tion equations, it is possible to determine the 
meaning of each beta coefficient reported in 
the second column of Table XXII. 





7° JOURNAL OF EXPERIMENTAL EDUCATION 


With the use of the beta coefficients ob- 
tained when the multiple correlations were 
determined it is now possible to set up a re- 
gression equation based on the criterion of 
pupil change. When the criterion is expressed 
in standard score units as previously indi- 
cated, it is possible to express the regression 
equation in terms of raw scores on the inde- 
pendent variables—teacher measures—and 
the criterion—dependent variable—in terms 
of standard scores. The general form of this 
regression equation can then be written as:** 


xX. = B..— = oni 3 
+(e —Ba~ 
OR 


if the mean of the criterion or dependent 

variable equals zero and has a standard devi- 

ation equal to one, this formula may be 
as: 


x, — Bax, 4 Be 


fay, — Bay, ia 


» ES a ee regres- 
sion equation by which predictions were made in this investi- 


[Vol. 14, No.1 


By substituting the proper beta coefficients, 
means, and sigmas in the above formula the 
following multiple regression equation results: 


Ze, x3...19 == .032X, + .021X, + .006X, 
+ .008X, + .496X, — .006X, — .oo1X, — 
PR i .002X, — 6.413. 


This equation is tested by substituting for 
the unknowns the appropriate mean scores 
obtained from the group of 47 teachers. The 
prediction of zero confirms the fact that the 
predicted score for the average teacher would, 


Sen X, 
et ac 


o. M, ) 
on a basis of standard scores, be equal to 
zero. 

A comparison of the predicted score of 
teaching ability with the actual score ob- 
tained will again test the accuracy of the 
regression equation for the group from which 
it was derived. With a multiple regression 
coefficient of .627 and a standard error of 
estimate of .779, it is possible to test the 
accuracy of the equation as a predicting in- 
strument by substituting the actual scores of 
a teacher obtained on the various tests. 


TABLE XXIV 
ILLUSTRATION OF THE USE OF THE REGRESSION EQUATION 


11 


66 
58 
109 
287 
3.2 
53 
216 
253 
65 


—.16 


scores_... —1.68 .91 . 03 


Key for tests listed in Table XXIV 
Torgerson Teacher Rating Scale. 
Hartmann Social Attitudes of Teachers. 
Wrightstone Scale of Civic Beliefs. 
Torgerson Teacher-Pupil Relationship! 
Yeager Scale for ~; Attitudes. 
Morris Trait Index 
Bernreuter Perwonality 1 Inventory Bn 


Teacher 
16 


84 
222 
3.2 
105 
5 
357 
201 
-87 —.175 


—2. 00 


rene eavnancys 


American Council Psychological Examination. 
American Council Civics and Government Test. 


te 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 71 


Scores of teachers selected at random are 

substituted below to illustrate the use of this 

jon equation as a predicting instrument 

of teaching ability: S. D..st.,, = .779 using 

the formula by Guilford.** One would not ex- 

the same degree of accuracy for a new 
group of teachers. 


The second part of this section is concerned 
with the use of other teacher measures 
arranged in a different order than that used 
in the first prediction of teaching ability. 
Here the ten teacher tests having the highest 
absolute correlation with the criterion of 
teaching ability based on pupil change were 
used. This list was secured by eliminating 
the American Council Civics and Government 
Test and by adding the test of Personal Fit- 
ness and also Sims Score Card for Socio- 
Economic Status. The following teacher 
measures were selected: 


Torgerson Teacher Rating Scale 
Hartmann Social Attitudes of Teachers 


There appears to be considerable evidence 
supporting the significance of the multiple 
correlations obtained from both of the series 
of tests used in this study for the purpose of 
predicting teaching ability. 

When the beta coefficients, means, and 
sigmas are substituted in the prediction 
formula and the proper multiplications and 
divisions are carried out, the regression equa- 
tion becomes: 

Ze, x...x19 == -038X, + .018X, — .042X, 
+ .0o9X, + .007X, + .449X, — .o11X, 
— .031X, — .oo1X, — .003X,, — 5.690. 


Test scores of teachers selected at random 
are substituted below to illustrate the use of 
this regression equation as a predicting in- 
strument of teaching ability. S$. Dis, = 
-768 based on the Multiple R of .640 (Table 


r 
with 
Criterion 


Personal Fitness Rating Scale—University of Wisconsin 


Wrightstone Scale of Civic Beliefs 
Torgerson Teacher-Pupil Relationship 


Yeager Scale for Measuring Attitudes Toward Teachers 


Morris Trait Index L 
Sims Score Card for Socio-Economic Status 
Bernreuter Personality Inventory Bn 

Xo American Council Psychological Examination 


With the use of Aitken’s Method of Pivotal 
Condensation as in the first part of this sec- 
tion, another set of beta coefficients necessary 
for computing multiple correlations and mul- 
tiple regression equations was obtained. 
Table XXV shows the beta coefficients from 
which the multiple correlations have been 
obtained. The multiple correlations are re- 
ported in Table X XVI. 


Data in Table XXVI have been obtained 
by the proper multiplication and addition of 
the beta coefficients and the correlation with 
the criterion indicated in Table XXV. The 
same observation should be made here as 
earlier that the error increases with the addi- 
tion of new variables with the addition of 
variables g and 10 the multiple R ceases to 
be significant at the 1% level. 


P. Guilford, Psychometric Methods (New York: 
Mean ta Book Company, Inc., 1936), p. 385. 


S. SDs =a Ceteton YO 


Considerable consistency appears to be 
found between obtained scores and predicted 
scores reported in Tables XXIV and XXVII. 
Thus it seems that the equations here used 
in predicting teaching ability in the field of 
the Social Studies may have some justifica- 
tion. With the addition of rating scales and a 
different combiration of teacher measures 
than that used in this study it may be pos- 
sible to obtain reliable measures of teaching 
efficiency. 

Other regression equations may be formed 
from data represented in this study, but the 
above equations would seem to indicate that 
many measures each contributing only mod- 
erately to teaching ability, may be success- 
fully combined to form more valid measures 
of teaching ability. It is entirely possible that 
while simple measures may contribute only - 
slightly to the measurement of teaching effi- 
ciency, items of such tests may be combined 
into statistically significant instruments. 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 


a 3 2 


-411 -428 .335 

- 261 .288 .275 
—.126 —.116 

.178 


TABLE XXVI 
MULTIPLE CORRELATIONS FOR VARIOUS COMBINATIONS OF TESTS AND THEIR SIGNIFICANCE 
_R 45—p 
Rein Roi. 1—R* 1—R’ Pp 
.181 -425 .819 . 221 44.000 
. 248 . 498 . 152 .330 21.500 
.252 .502 .748 .337 14.000 
268 .681 .718 .898 10.250 
. 287 .686 .713 . 403 8. 
. 334 . 578 . 666 . 502 : 
-.368 .606 .632 .582 ; 
.896 .629 .604 .656 : 
. 404 .6386 .596 . 678 . 
. 410 . 640 . 590 . 693 3. 


*Significant at 5% level; 5 chances in 100 from an uncorrelated random population. 
**Significant at 1% level; 1 chance in 100 from an uncorrelated random population. 


P 


bo 0 Co co Co Com > INO 
£0 00 2000 90 9090 50 > 
0909 Co no 
aSSeaSelcs 


SODA AS wONre 


-_ 
tw 
" 
- 


TABLE XXVII 
ILLUSTRATION OF THE USE OF THE REGRESSION EQUATION 


Teacher 
11 


66 
58 
3. 
109 
287 
3. 
53 
18 
216 


253 256 
Criterion 
scores._... —1.70 ? —.16 J .87 —1.75 
Predicted 
scores.... —1.89 é .10 . .72 —2.23 





SO SS ONS SSeS 
an oo 


S&S & 


September, 1945] THE MEASUREMENT OF TEACHING ABILITY 73 


SECTION VI 
SUMMARY AND CONCLUSIONS 
The main objective of this study was to 
determine the validity of certain measures of 
teaching ability as correlated with pupil 
change as a Criterion. Intercorrelations were 
calculated between pupil-change and each of 
the teacher measures and teams of such 
measures. From these data it was possible to 
determine those measures which seemed to 
most value as measures of teaching 
ability. The measures possessing the highest 
validity were: 
. Torgerson Teacher Rating Scale. 
. Michigan Rating Scale. 
Hartmann Social Attitudes. 
. Almy—Sorenson Teacher Rating Scale. 
. A Personal Fitness Rating Scale. 
. Size of school—number of pupils. 
. A Personality Rating Scale. 
. Wrightstone Scale of Civic Beliefs. 
. Salary of the teacher. 
. Torgerson Teacher—Pupil Relationships. 
. Yeager Scale for Measuring Attitudes 


Toward Teachers and the Teaching 
Profession. 


oop OM AN Sw DN 


“= 
_ 


By combining the following measures into 
a multiple regression equation a multiple 
correlation of .627 was obtained, with a 
S. Dest. Of .779: 


. Torgerson Teacher Rating Scale. 

. Hartmann Social Attitudes of Teachers. 
. Wrightstone Scale of Civic Beliefs. 

. Torgerson Teacher—Pupil Relationship. 
. Yeager Scale for Measuring Attitudes 


Towards Teachers and the Teaching 
Profession. 


. Morris Trait Index L. 

7. Bernreuter Personality Inventory, Bn. 

’ American Council Psychological Exam- 
ination. 

. American Council Civics and Govern- 
ment Test. 


A second group of teacher measures gave 
a somewhat higher multiple correlation. This 
second set of tests was selected on the basis 
of their correlation with the criterion and 


have been arranged in the order of size. 
These measures include a large variety of 
qualities and abilities, some of which appear 
to show a definite relationship with the cri- 
terion of teaching ability used in this study. 
A list of the ten teacher measures making up 
the group is given below: 


. Torgerson Teacher Rating Scale. 

. Hartmann Social Attitudes of Teachers. 

. Personal Fitness Rating Scale. 

. Wrightstone Scale of Civic Beliefs. 

. Torgerson Teacher—Pupil Relationship. 

. Yeager Scale for Measuring Attitudes 
Toward Teachers and the Teaching 
Profession. 

. Morris Trait Index L. 

. Sims Score Card for Socio-Economic 
Status—Form C. 

. Bernreuter Personality Inventory, Bn. 

. American Council Psychological Exam- 
ination. 


This combination gave a multiple correla- 
tion of .640 with a standard error of estimate 
of .768. It is also interesting to note that the 
correlation of the obtained scores of teaching 
ability (the criterion scores) correlate with 
the predicted scores of the 47 teachers to the 
same figure as the multiple correlation of 
.640. 


Other interesting and valuable combina- 
tions of these teacher measures are possible 
and some of them may prove valuable in pre- 
dicting teaching success as well as assisting 
in the better training of teachers in service. 


Although the data in this study are at 
times conflicting, certain conclusions relative 
to the evaluation of teaching efficiency seem 
warranted: 


1. Personality as here defined and meas- 
ured seems to posses a positive rela- 
pion ay good teaching (r == .35; 
7 = .30). 

. Rating scales when used by experienced 
and competent supervisors for the 
purpose of evaluating teacher efficiency 
give a positive correlation (r == .36 to 

= .43). 

. Social attitudes as measured by the 
Hartmann and Wrightstone Scales 
appear to be related to teaching effi- 
ciency (r == .38; r = .29). 





JOURNAL OF EXPERIMENTAL EDUCATION 


. Size of the school appears to possess 
significance in evaluating teaching 
efficiency as here measured (r == .31). 
. Teacher—pupil relationship as meas- 
ured by the Torgerson Teacher—Pupil 
Relationship Test is positively corre- 
lated with teaching efficiency but not 
in an amount that is statistically sig- 
nificant (r == .22). 

. Attitudes toward teachers and the 
teaching profession as measured by the 
Yeager Scale is positively correlated 
with teaching efficiency but not in an 
amount that is statistically significant 
(7 == .22). 

. The Bernreuter Personality Inventory, 
Bn—neurotic tendencies—shows a 
small negative relationship (r — 
—.14). 

. Dominance as measured by ‘he Bern- 
reuter Scale does not appear to con- 
tribute to teaching efficiency (r = 
04). 


[Vol. 14, No.r 


9. Social adjustment as measured by the 


Washburne Scale seems not to be re. 
lated to teaching efficiency (r == .06). 


. Intelligence as measured by the Amer- 


ican Council Psychological examination 
seems not to be related to teaching 
efficiency (r == —.10). 


. The age and experience of the teacher 


contributes little when measured 
against the criterion of pupil change 
as set up in this study (r = .o1; r = 
10). 


. Leadership as measured by the Morris 


Trait Index L is negatively correlated 
with teaching efficiency (r —= —.17). 


. There appears to be considerable evi- 


dence that the teacher in these rural 
schools contributes less to pupil suc- 
cess than do teachers in the school 
where there is a single grade to be 
taught. This fact may throw some 
light upon the inconsistencies between 
this study and that reported by 
Rostker. 


nn 
f 
C 
a 
¢ 
ee 
f 
‘ 





Re II@2R aeBI-FRF g 


~~ 


QersFos gs 


THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER THREE 


C. V. LADUKE 


SECTION I 
THE PROBLEM 


This investigation was undertaken as a fur- 
ther check on the results of two earlier studies 
of teaching ability. The results of the two 
earlier studies seemed to be in essential agree- 
ment in most res ; but further research 
seemed desirable. purpose of this particular 
study was to study the validity of a selected 
battery of tests shown by earlier investigations 
to have particular promise. 

The results of the study should assist in 
answering the following questions: 


1. What relationships do certain teacher 
factors, like intelligence, have to an 
objective criterion of teaching efficiency? 

2. How do supervisory ratings agree with an 
objective criterion of teaching efficiency? 

3. What validity may certain selected teacher 
measures have when validated against an 
objective criterion of pupil change? 


SECTION II 


EXPERIMENTAL DESIGN AND 
PROCEDURE 


The design of this study was controlled to 
a large extent by the criterion of teaching abil- 
ity which the investigator decided to use. The 
criterion was that of pupil change. The investi- 
gation consisted, therefore, chiefly of studying 
the relationship between pupil change and a 
number of teacher qualities under partially 
controlled conditions. 

The investigation was a part of a quite elabo- 
tate study of radio in education, undertaken 
during the years 1937-39 under the supervision 
of the Department of Education and Speech of 
the University of Wisconsin. It was possible by 
careful planning to set up a group of teachers 
piSee reports by L. E. Rostker and J. F. Rolfe reproduced 


a a | L. Ewbank, and others, Radio in the Classroom, 
pia Xa ti RF 
Wisconsin Press, 1942), 203 pp. ne se ial 


75 


in the social science area in such a fashion that 
it would not only meet the purposes of the 
radio study but would also serve the purposes 
of this study. The study had the advantage of 
being quite adequately financed, making it pos- 
sible to provide exceptionally good supervision 
of the project and collection of data. 


SELECTION OF THE SCHOOLS 


Since teachers were to be evaluated in terms 
of changes produced in their pupils, a depart- 
mentalized system where pupils had more than 
one teacher would complicate the problem, so 
it was decided to contin the study to one- 
teacher rural schools. It was quite essential that 
the schools be located near the seat of opera- 
tions to reduce travel and expense; so the 
names of all the rural schools having no radios 
in seven Wisconsin counties near Madison were 
put in a box and after being thoroughly mixed, 
the names of the required number of schools 
were drawn at random. Extras were drawn to 
offset those who might not cooperate. Upon 
contacting the teachers of these schools, forty- 
one were glad to take part in the study and 
were enrolled. Later, seven of these schools 
were dropped due to withdrawal of pupils or 
absence 0: By omg from tests, so complete data 
was secured for only 34 7th- and 8th-grade 
classes enrolling a total of 200 pupils. The 
enrollment in each school, the enrollment in 
the participating class, the assessed valuation 
of each district, and the annual district expendi- 
ture are shown in Table I. Inspection of this 
table reveals a condition of heterogeneity in all 
of the factors listed that might be quite disturb- 
ing when one considers that the teachers are 
to be measured by | change, and that pupil 
change might be determined in by these 
factors. The disparagement in the district 
assessed valuation would ordinarily be inter- 
preted as differences in ability to support 
schools. The state aid law in Wisconsin, how- 
ever, is so drawn and administered that the 
school tax rate is practically no larger in schools 
of low valuation than in schools of large valua- 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14,No.r 


TABLE I 
INFORMATION RELATIVE TO SCHOOLS, CLASSES, AND FINANCIAL SUPPORT 


Participating 
Enrollment 


tion, even though equal amounts are spent for 
school support.? 

Class size varies from the smallest class of 
three members to the largest of nine, and while 
quite different relatively, are still in the small 
class category. Experimental evidence, so far as 
could be determined as to size of class and 
teaching efficiency, shows no advantage to 
classes of either extreme. 

In so far as could be determined, the same 
line of reasoning applies to the effect of the 
size of the school. ile enrollment varied 
from thirteen to thirty-seven, no evidence could 
be found indicating that either extreme would 
have any particular advantage. 

Socio-economic factors that may be operative 
in the schools employed in this investigation 
are depicted in Table II. Review of these data 
reveals great uniformity from school to school. 
Occupationally the various districts are much 


Cc. “Our Equalization Law,” Wisconsin Journal of 
Réwcation Charen 1933), pp. 313-314. 


Assessed Total District 
Valuation of E diture 
District 1987-19388 
$250 ,000 $1490 

970 


215,000 
291 ,000 
200 ,000 


4 
+ 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


alike, being predominantly agricultural; eco- 
nomic independence in all districts is indicated 
by the very few on relief; nationality (extrac- 
tion) varies from community to community, 
but since the percentage of foreign born is so 
small, it is doubtful if the extraction factor 
should be given much weight. In other words, 
they are all Americans. 


THE PARTICIPATING TEACHERS 


The random selection of schools described 
above resulted in the selection of the teachers 
as well, since the schools are all one-teacher 
schools. Table III summarizes information col- 
lected with reference to the teachers. In age the 
teachers varied from 21 to 44; in salary from 
$80 to $115 a month; in experience from one 
to 16 years; in tenure of present position from 
none to five years; in certification from county 
third to state unlimited certificate; and 
in professional training for teaching, from two 





7829 


¥kEes 


coamwsb aba TAB & 


September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE II 
Data RELATING TO SOCIO-ECONOMIC FACTORS 


Percent Percent Relief Predominating 
or W.P.A. 


S$ oon of 


oonrooun 
wecoorooooooocoo 


cone moooouanooo 
mes orooonw 


Buancrh son 


to 
CoOone 


* Data for school number 18 not available. 


semesters in county training school to eight 
semesters in teachers college and university. Of 
the thirty-four teachers, four were men. 


The typical teacher then was a twenty-four 
year old woman, with one year of professional 
training in a county normal school, holding a 
first-grade county teacher's certificate, having 
three years of teaching experience previous to 
the year of the investigation, having taught one 
year in the present position, and receiving a 
salary of ninety dollars a month or $810 a year. 


PREPARATION OF COURSE OF STUDY 


The outline of the course of study for Com- 
munity Living, composed of eight units, was 
aes by members of the State Department 
of lic Instruction, a committee of teachers 
of the social studies who volunteered to assist 
in the planning of the series, and members of 
the radio project research staff. The attempt was 


Percent Foreign Born 


Nationality in Community 


5% (Polish). 
20% (German) 


made to portray the progressive development 
of the political and social organization of our 
democratic society. The child’s responsibility in 
the functioning of democratic society was to be 
emphasized. 

The following outline and schedule was 
developed: 


Unit I. Your Family, Home, and Community 
Sept. 26. You and your family 

Oct. 3. You and your home 

Oct. 10. Your home and your community 


Unit Il. How the Community Serves You 


Oct. 17. Safe highways 

Oct. 24. Protection of life and property 

Oct. 31. Education—your opportunity 

Nov. 7. Recreation—a new community service 
Nov. 14. Health—a community problem 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.t 


TABLE III 
DaTA RELATIVE TO PARTICIPATING TEACHERS 


Training beyond H. 8. 


in semesters 


Teaching Tenure 


Sex Monthly Enxperi- 
~~ Salary ence 


ArAIAWeOWWoow 


—s 


ANISH OAIMNW oN awa 


85 
105 
95 
115 
110 


— 
aw 


il 
3 


F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
M 
M 
F 
F 
F 
M 
F 
F 
F 
F 
F 
F 
F 
F 
F 


Present 
School 


09 Com DY DO CO DD DO ee Ot 


Nwoanw ewe Cr DO: 


County Teacher Other 
Normal College 


License 
Held 
County (1) 
County (1) 
Rural 
pears f (1) 
pone 
ounty 
County 
County 
County 
County 
County 
County 
State 
State 
County 
County 
County 
County 
County 
County 
County 
County 
County 
County 
Coun 


County 
County 


* Schools number 28, 31, and 39 were omitted to secure homogeneity. 


Unit Ill. How the State Serves You 


Nov. 21. Conservation of natural resources 

Nov. 28. State protection for producer and 
consumer 

Dec. 5. Services of the state university 

Dec. 12. The Wisconsin Social Security Law 


Unit IV. How the National Government 
Serves You 


Jan. 2. Uncle Sam carries the mail 
Jan. 9. Government research serves your home 
Jan. 16. Government regulation—labor, com- 
j munication, etc. 

Jan. 23. Uncle Sam cares for the unemployed 


Unit V. How Your Government is Organized 
and Supported 


Feb. 6. Managing your local government 
Feb. 13. Managing your state government 
Feb. 20. Managing your national government 
Feb. 27. Paying the bill together—taxation 


Unit VI. Making a Living in Your Community 

March 6. Making a living in the country— 
agriculture 

March 13. Making a living in the city—Wis- 
consin industries 

March 20. Workers’ problems in town and 
country 

March 27. Buying and selling together—co- 
operatives 


Unit VII. Social and political Groups 
April 3. Nationality groups in Wisconsin 
April 10. Social life in your community 
April 17. Political parties 
April 24. Your part in democracy 

Unit VIII. Your Community and World 
Society 


May 1. Your state and world markets 
May 8. Your country and world problems 
May 15. Your part in the world community 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 79 


The objectives of the course were formulated 
by members of the research staff who attended 
the Progressive Education Workshop held in 
Bronxville, New York, during June and July, 
1938. Leaders in the field of social studies who 
were in attendance at the conference assisted 
in a large measure in the formation of these 
objectives. 

The following specific objectives for Unit 
Ill, How the State es You and Your Com- 
munity, are illustrative of the objectives devel- 


oped:* 
SPECIFIC OBJECTIVES 
A. Functional Information: 


(1) To develop an understanding of how 
the state serves its citizens and their 
many interests and needs. 

(2) To indicate the ways in which the 
services of the state are related to the 
services of local government. 

(3) To indicate the community needs which 
make state services necessary. 

(4) To indicate the manner in which the 
state provides for the conservation of its 
natural resources. 

(5) To indicate the ways in which the state 


attempts to = the interests of the 


producer and consumer. 

(6) To indicate some of the services pro- 
vided by the state university for the 
citizens of the state. 

(7) To indicate the manner in which the 
state provides for social welfare—social 
security. 

(8) To indicate the deficiencies in the serv- 
ice of the state. 


B. Interests: 


(1) To develop an interest in the functions 
of the state government. 

(2) To develop an interest in the ways the 
state protects our natural resources. 

(3) To become interested in the ways in 
which the state protects the producer 
and the consumer. 

(4) To become interested in the services 
provided by the state university. 

(5) To become interested in the problem of 
providing for the social welfare of its 
citizens. 

*For complete list of objectives, see Teachers’ Manual, 

original 


see 
, thesis on file in the University Library, 
U ity of Wisconsin. 


C. Appreciations: 


(1) To recognize the value of the services 
of the state to the individual and to the 
community. 

(2) To recognize the importance of con- 
serving our natural resources. 

(3) To develop a recognition of the state's 
responsibility for social welfare. 

(4) To recognize the importance of the 
services provided by the state university. 

(5) To develop an — of the inter- 
dependence of the producer and the 
consumer. 

(6) To develop an appreciation of the indi- 
vidual contributions to the growth of 
state services. 


D. Attitudes: 


(1) To develop an attitude of cooperation 
toward state services. 

(2) To develop an attitude of concern with 
regard to state activities. 

(3) To develop a sense of responsibility in 
improving the services of the state. 

(4) To promote an attitude of concern re- 
gatding the conservation of natural re- 
sources. 

(5) To develop an attitude of consideration 
of the rights and needs of the different 
interest groups within the state in regard 
to governmental services. 


COLLECTION OF PupPpiL DATA 


Based upon objectives such as those illus- 
trated above, objective tests measuring appre- 
ciations, attitudes, information, and interests* 
were constructed. 

These tests, arranged in two test booklets, 
were administered to all the pupils in the study 
by field investigators during the week preced- 
ing October 17, 1938. The purpose of these 
tests was to secure a measure of the status of 
the pupils at the beginning of the rimental 
period, during which the course in unity 
Living was presented. One thirty-five minute 
class period per day was devoted to the study 
of Community Living. Each teacher was free 
to present the material by any method she 
might choose, but not more than one class 
period a week was to be devoted to any one 
topic. 

See Section III for copies of tests, and Appendix B for 


a 
discussion of their construction, in original thesis on file in 
University Library, University of Wisconsin. 





80 JOURNAL OF EXPERIMENTAL EDUCATION 


During the week following April 24, 1939, 
field investi again ini the test 
that had been used at the beginning of the 
teaching period. Thus a measure of the pupil 
status was secured at the end of the period of 
instruction. Pupil change during the interven- 
ing six month period was determined by sub- 
tracting the scores on the pre-test from those 
on the final test. 


Shortly after the pre-test was administered, 
field workers gave the pupils the Kuhlmann— 
Anderson Intelligence Test from which each 
pupil's M.A. and 1.Q. was determined. 


COLLECTION OF TEACHER DATA 


To measure teacher qualities the teachers 
were assembled at central meeting places, usu- 
ally the county seat, and tests were administered 
by field workers. A measure of the teacher's 
intelligence was secured by administering the 
American Council Psychological Examination.* 
The teacher’s knowledge of mental hygiene was 
secured by administering Torgerson’s Theory 
and Practice of Mental Hygiene test. Yeager’s 
Scale for Measuring Attitude toward Teachers 
and Teaching was administered to secure a 
measure of the teacher's interest in her work. 


Besides these measures which were secured 
under controlled conditions, two measures were 
sent to the teachers by mail: Harnly’s State- 
ments about Education, which purports to give 
a liberal-conservative position measure, and 
Jackson’s Social Proficiency Test, which aims to 
measure “consideration for others’. Since these 
tests have no right or wrong answers, but in- 
stead reflect the individual teacher's attitude, 
unsupervised administration appears to be a 
valid procedure. 

Besides these teacher measures, secured 
through paper and pencil tests, ratings of the 
teachers were secured from the county superin- 
tendent and the county supervising teacher. 
Two copies of three rating blanks: the Torger- 
son Diagnostic Rating Scale, the Almy-— 
Sorenson Rating Scale, and the Michigan 
Rating Scale, were sent to the county superin- 
tendent under whom the teacher worked, with 

‘the request that he and the supervising teacher 
make separate ratings for each of the teachers 
participating in the study. These ratings were 
to be based not on any one visitation but rather 
upon their cumulative impressions. Since these 


5 All teacher measures are described in Section III of this 
report. 


[Vol. 74, No.1 


officers know most of their teachers well, such 
ratings should be as reliable as it is possible 
to obtain. 


SECTION III 
DESCRIPTION OF PUPIL AND TEACHER 


PupPit TEsTs 

The pupil tests used in this investigation 
assume great importance since the changes in 
the pupils are determined by these tests and the 
teachers, in turn, are measured by the changes. 
Two types of pupil tests were used: (a) those 
measuring factors conditioning pupil achieve. 
ment and (b) those measuring pupil achieve. 
ment. 


The Kuhlmann—Anderson intelligence test 
for seventh and eighth grades* was used to 
measure intelligence. 


A specially constructed test of information, 
appreciations, attitudes, and interests as related 
to community living was used in measuring 
pepil achievement. The questions were assem- 

led into two test booklets,” labeled ‘Social 
Studies Questionnaire”, Form A and Form B, 
with nothing to indicate to the pupil whether a 
question measured information, appreciation, 
interest, or an attitude. The questions in the 
last three named areas were placed in random 
order so as to give minimum assistance to the 
pupil as to their character. Table IV below 
identifies the questions by areas. The reason for 
dividing the questionnaire into A and B forms 
was simply to make it possible to adminster the 
tests in two different testing periods, thus 
avoiding too prolonged testing at any one time. 


PREPARATION AND VALIDATION OF THE PUPIL 
ACHIEVEMENT MEASURES 


The test items in the questionnaire described 
above were constructed and validated by mem- 
bers of the Wisconsin radio project staff* in 
attendance at the Progressive Education Asso- 
ciation summer workshop, Bronxville, New 
York, during June and july , 1938. The follow- 
ing definitions were accepted: 

*F. Kuhlmann and Kuhimann—Anderson 


Rose Anderson, 

Test, Fourth Edition, Grades VII and VIII. (Minneapolis, 
Minnesota: Educational Test Bureau, Inc., 1927). 

7 Copies of these question booklets may be found in 
dix D, original thesis on file in the University Library, Uni- 
versity of Wisconsin. 

SA. G. Hellfritzsch, “Prelimi Sepest on Communi 
Living Section of Radio Study,” Unpublished material, Un 
versity of Wisconsin, 1940. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE IV 
DISTRIBUTION OF ITEMS IN THE SOCIAL STUDIES QUESTIONNAIRE 


ATTITUDE 
90 items 
Form A Form B 


APPRECIATION 
items 
Form A Form B 


120 121 


Functional Information was taken to mean: 

Facts or information in a given area useful 
in everyday living. 

Interests were taken to mean: 

Desire to extend or intensify one’s knowl- 
edge about or experience with a given 
area. 

Appreciations were taken to mean: 

Recognition of social significance of a given 

area. 


Form A 
All of 


INFORMATION 
110 items 
Form B 


All of 

Part Parts 
Il II 

and III 


Attitude was taken to mean: 

A mental set which conditions the direction 
that one’s behavior (mental or physical) 
will take when responding to some stim- 
ulus within the area. 


Based on these definitions, objectives in the 
different areas were established and test items 
to determine status of pupils with reference to 
these objectives were drawn up. National 
leaders in the field of social studies, in attend- 





82 JOURNAL OF EXPERIMENTAL EDUCATION 


ance at the conference, contributed liberally to 
the crystallization of definitions, objectives, and 
test items. Thus a pi form of the test 
consisting of 110 interest, 112 attitude, 120 
appreciation, and 204 information items ‘pe 
constructed. The test was then 
two hundred seventh- and eighth-grade calle 
and the items validated by the upper and lower 
third method, using the total score for each 
type of items as the criterion score in validating 
items of that . To be retained in the test 
the item difficulty had to exceed 20% for the 
total group and 10% for the upper third. Dis- 
crimination was considered significant at the 
1% level. 

The items that were retained, namely: 58 
interest, 90 attitude, 94 appreciation, and 110 
information were put into the booklet form 
described earlier in this section and constituted 
the pupil tests used to measure pupil growth. 

The reliabilities of the tests, determined by 
the split-halves method for 150 pupils selected 
at random from the radio study are listed in 
Table IV. 


TABLE LXI 
RELIABILITIES OF PUPILS’ TESTS 


Pre-test Final test Gain 


Appreciation... .80 . 88 . 64 
itud ’ .81 . 62 
Information - -_- .91 . 84 2 
. 89 . 53 


TEACHER TESTS 


Five teacher tests were administered to the 
teachers participating in the investigation. A 
brief description of each test and the reason 
for its inclusion follows: 


1. The American Council on Education Psy- 
chological Examination for College Freshmen, 
1936 Edition.® 

2. Theory and Practice of Mental Hygiene.'° 

3. A Scale for Measuring Attitudes toward 
Teachers and the Teaching Profession.“ 

4. Statements about education**—This in- 
strument is an attitude scale consisting of 
octenrriae, 3 oiraz sme ont She 
Washington, D.C. Fo full description of this at see 


Department of Education, 
University of isconsin. Unpublished. For full description of 
test see 


C.” Yeager, An Analysis 9 Certain Traits of Se- 
School Seniors I: meee. in ine Coie. 


a of Publica- 
7 umbia, by ty 1935) For 
ull description of this test see 
W. Harnly, “Atti 
Education,” The School 


Be vit ce Seniors 
Toward Rec VII (September, 
1939), pp. 501-509. 


[Vol. 14, No.1 


eighty statements about education divided into 
four parts: “Some of Education,” 
“Some General Educational Policies,” ‘What 
Shall We Teach?” and “How Shall We 
Teach?” Each division contains ten statements 
worded from a progressive position and ten 
worded from a conservative position. The 
teacher is asked to respond to the statements 
by indicating agreement or disagreement on a 
five point scale. In scoring, one point is 
assigned to the most liberal reaction and five 
points to the most conservative reaction. Low 
scores on the whole test, then, indicate a liberal 
position. while a high score indicates a con- 
servative position. The author reports split 
halves reliability of .87 + .02 which, when 
corrected by the Spearman-Brown formula, 
becomes .93 + .01. The scale yields two kinds 
of data; the total score indicating a general 
attitude of progressivism or conservatism and 
a part score for each of the four divisions indi- 
cating general attitude in each of these areas. 
The scale was included in this study to discover 
if the progressivism or conservatism of the 
teacher as here measured has any relation to 
teaching efficiency. 


5. Social proficiency test2*—The author of 
this test defines social proficiency as ‘‘consid- 
eration for others.” The measure consists of 
fifty-two social situations to each of which four 
solutions (responses) ate offered. The teacher 
checks the solution he would use if faced with 
that particular situation. There are no right or 
wrong responses. The aim of the questionnaire 
is to get a status view of the teacher's consid- 
eration for others—of his social proficiency. 
The key assigns a value to each of the four 
choices, varying in weight from one to eight, 
the largest weight being assigned to the choice 
that shows the greatest consideration. Hence, a 
high total score indicates a high degree of social 
proficiency. 


The measure was included in this study be- 
cause, according to the author, it measures some 
quality not measured by psychological and per- 
sonality tests, and because it seems that, other 
things being equal, the teacher with more con- 
sideration for others might be the better 
teacher. 

D. Jackson, “A Test of Social Proficiency” in 
Social Proficiency, 


“The t of " The —* of 
Experimental Education, VIII (June, 1940), pp. 422-474. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE V 


Cuiass MEANS FoR TEST IN ATTITUDES 
PRE-TEST 


Mean 
46.25 
35. 25 
41.71 
43. 57 
40.20 
54.20 
47.17 
37.29 
50. 55 
35. 43 
35. 60 
38.17 
48.29 
34. 63 
38.00 
31.29 
35. 83 
48.00 


8 
3 


SSASVSSsassss 


oO. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


41.38 


z 
*An improved estimate of the true standard deviation: +4 


o* 
12. 87 
18.19 

8.27 
17.20 
18.29 
11.92 

8. 83 

6.36 
11.91 
13. 62 

8.32 
10.73 

8.35 

8.06 


-_ 
Na 


Sxehmevrerersesesesss 


EeSSh 


—_ _ 
Seer 
— 


i] 


| 


_ Sica 
aro 


tw 


PP OOH 


— 


SOLO RO PO ds 


PPPrrrr. 


Z 
7 


aSAIePaerSsssss 


ane 
“a 


DO 


. 37 


12.59 : : 4. 86 
6. 80 4.76 


. From E. F. Lindquist 


Statistical Analysis in Educational Research. (Boston: Houghton Mifflin Co., 1940) pp. 48-50. 
** Based on 181 pupils, Class Nos. 28, 31, and 39 omitted to'secure homogeneity. 
*** Based on 31 class means, Nos. 28, 31, and 39 omitted to secure homogeneity 


The author reports the contingency coefhi- 
cient of validity of the test to be .78 and the 
product-moment validity coefficient as .92. The 


criterion was the judgment of intimate friends ers.*® 


relative to social proficiency. The reliability 
coefficient was .82 when “stepped up” by the | ™T. L. {Torgerson 


Spearman—Brown formula. 


‘TEACHER RATING SCALES 


Qests of beeen 4 sewit 
fic’ School Publishing Co. Inc’, 1930. (hor full, descriptina 


tee Mee 19 


Scale for “Teachers,” B i 
ing Co., Inc. (For full 


1. Torgerson Diagnostic Teacher Rating 
Scale of Instructional Activities.*¢ 
2. Almy-Sorenson Rating Scale for Teach- 


3. The fer: Rating Scale.** 
Diagnostic Teacher Rating 
Illinois: 


Pub- 


Public School 


, Illinois: 
tion of this test see 


Three teacher rating scales were used for the > 2 asian Education Association Teachers Rating Card,” 


purpose of getting supervisory ratings: 


(Lansing, . 
For full description of this test see p. 19. 


Association, 1930). 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 


TABLE VI 
CLASS MEANS FOR TEST IN APPRECIATION 


PRE-TEST FINAL TEST 
o* M o 
13.07 12.92 
19.21 30. 59 
11. 58 13. 62 
19. 45 14. 40 
21.58 18. 01 
12.12 
10.95 
11.61 
16.23 
4.73 
5.76 
11.10 
8.49 
9. 66 


SSEASSSASASSSSRLSESRS 


0. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


VSLSRSSRESLEASRSSRRSSESEASSSSSWAS! 


s 
eS 


42.36 
42.23 


** *** Refer to footnotes Table V. 


SECTION IV 


DEVELOPING THE CRITERIA OF 
TEACHING ABILITY 


Since the teachers in this study are to be 
evaluated in terms of pupil change, the best 
teacher being the one who secures the greatest 
amount of change, the determination of these 
changes assumes utmost significance. 

The first task then was to secure a measure 
or measures of pupil change, and the second to 
determine the amount of this change that may 
be ascribed to the teacher. This section will be 
devoted to an account of the procedures used 
to determine these changes. 

The pupil measures’ used in this study have 
already been described in Section III. They 


— 
SSASHPSLSPS. ™. 


SVeSusvezees 
Cee SSSSrNVSSaBVsssssusessens 


SLARSSSSERSSS Aa: 


os 

at Tor 
— DO 
a) 


wove 
aS 
se 
i 


noe 
PRA SMAAMOAS Mm ww Sow ren 


— 


purported to measure changes in appreciation, 
attitude, information, and interest as related to 
the problem of “Community Living”. The 
mean change scores for each of the thirty-four 
classes as well as the mean pre and final scores 
from which they were derived are shown in 
Tables V, VI, VII and VIII. 


THE PROBLEM OF HOMOGENEITY 


Since each teacher is to be measured on the 
basis of the average change in her class, it is 
evident, that if such measurement is to be valid, 
the classes will have to be approximately alike 
at the beginning of the teaching period. That 
is, to be fair to each teacher, no class should 
differ markedly from another in such factors 
as mental age, intelligence, informational status, 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE VII 
Ciass MEANS FOR TEST IN INFORMATION 


PRE-TEST 
o* 


Mean 
59. 25 


a 


gop 


_ 


SERPS 


0. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


gee seesesse! 
Sze SSRSBass 


*** Refer to footnote Table V. 


socio-economic status, etc. Table X lists the 
mean intelligence quotients and the mean 
ran ages of the pupils in the thirty-four 
classes. 


To test homogenity the F test described by 
Snedecor** was applied to the thirty-four 
classes. This test measures the significance of 
differences among any number of group means 
and determines whether they vary more than 
would be expected in a random sampling from 
a homogeneous, normally distributed popula- 
tion. The F values and the 5% and 1% 
tabular values with which they are compared 
are shown in Table XI. A value of F greater 
than the tabular 5% value, indicates that the 
means vary more than would be expected in 


7G. W. Snedecor, Statistical Methods, Ames, Iowa: Col- 
legiate Press, 1938, pp. 184-198. pri 


_ 
MMM WN! 


SF raogans 


Same 
VSRISSVSsssRSSSSSSSIRNSSeP 


FINAL TEST 
a 


12.31 


Hook 
a 8 


bo 
SPAN OCHANS ANS 


12. 34 
9. 


CSN Ar EK 
SeSSRSSRSeSSSEsq 


e- (J) 

sO > $2 GO PO GO 
2) 
~ 


L | 


HSSResaSsSssssrsaseses 
Pao IEP wo cor 


SSS SSAVSSLSSLSASSSSASSASSRSSASKS 
BAS HVVSResasssssssa 


SFR SAS 


random sampling from a homogeneous, nor- 
mally distributed population. 

As may be seen in Table X, the F values of 
four of the pupil factors indicate heterogeneity 
with reference to class, means greater than ex- 
pected at the 5% level. To secure homogeneity, 
classes with discrepant means were deleted 
from the group one at a time until the F values 
fell within the desired 5% tabular values. To 
secure this desired degree of homogeneity, 
three classes: Nos. 28, 31, and 39 had to be 
deleted. Table XI lists the F values of the 
remaining 31 classes with the comparative 57% 
and 1% tabular values. Most of the F values 
have been reduced below not only the 5% 
value but the 1% value as well. 

One of the fundamental conditions prerequi- 
site to the application of the F test above de- 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14,No.1 


TABLE VIII 


CLass MEANS FOR 


PRE-TEST 
Mean 
$1. 25 
29. 00 
84. 86 
28.14 
26. 80 
44.40 
82.17 


_ 


— — 
PS HST HPO or 90 80 Gd. G9 GON0 ¢ 


a ee 
PSE NeNs 
AA Ponce 
aa neoe 


oO. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


*** Refer to footnotes Table V. 


scribed is that the variances of the several 
classes must not differ significantly among 
themselves.** That this condition prevailed was 
determined by applying a chi-square test de- 
scribed by Rider*® and by Lindquist.*° The chi- 
square values obtained for the thirty-four 
classes are indicated in Table X together with 
the comparative 5% and 1% tabular values. 
The effect of the deletion of three classes upon 
the variance of the classes is shown in Table 
XI. (The variances of the thirty-one classes 
were well within the 5% value.) 


% Ibid., p. 208. 
Paul R. Rider, An Introduction of Modern Statistical 
Methods, (New York: John Wiley & Sons, Inc., 1939), pp. 


102-103. 
Analysis in Educational Re- 
Company, 


5 
Boston: Mifflin 1940), pp. 


10. 89 
6.73 
13.47 
9.03 


TEST IN INTEREST 


FINAL TEST 
ean o 
33.25 6. 60 
14.77 
19. 09 
9.90 


e* 


gone ER 
ssseacasé 


ad edd 


8 dae 


Ll 


mworem 


Ee 
SSSASSaSSVSRESRABE ASSES 


on 
— 
PMN eA 


Ceaiaeniien 
99 St RAS PO IMS MOM NSS? 


— 
amd 
for) 
i) 


SSRLaLSRSALTS 


SSRSAaBKSE 


~ | 
2m OH ARES 
id 


c—) 


S35 SSS 


Factors AFFECTING CHANGE 


Having thus established through the analysis 
of variance technique that thirty-one of the 
thirty-four classes do not vary more than would 
be expected in a random sampling from a 
homogeneous, normally distributed population, 
the four sets of change scores (See Tables 
XII, XIII, XIV and XV) for these thirty- 
one schools were taken and arranged in 
descending order, for the four criteria of teach- 
ing effectiveness based on changes in apprecia- 
tion, attitude, information, and interest, re- 
spectively. This procedure is not wholly de- 
fensible, however, since it assumes that all of 
the change or absence of change made by the 
various classes during the experimental period 
is due to the teacher. That such assumption is 





September, 1945], THE MEASUREMENT OF TEACHING ABILITY 


TABLE IX 
MENTAL AGE AND INTELLIGENCE QUOTIENTS 
M.A. 


a 
* 


Mean 

154. 50 
153. 50 
153. 00 
154. 71 


_ SS 
$2 9089 
seSeSsS 


— 
ses 
SSSa 


0. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


*** Refer to footnotes Table V. 


TABLE X 


F AND CHI-SQUARE VALUES USED TO DETERMINE F' AND CHI-SQUARE VALUES USED TO DETERMINE 
HOMOGENEITY OF MEANS AND VARIANCE OF HOMOGENEITY OF MEANS AND VARIANCE 
Purr. FACTORS IN THE 34 CLASSES oF PupiIL FacTtors—31 CLASSES 
Pupil Factor F Chi-square Pupil Factor F Chi-square 
Information pre-test __ 2. 636 44.352 Information pre-test __ 1. 364 36. 755 
Attitude pre-test 2. 330 46.398 Attitude pre-test 1. 872 38. 640 
Appreciation pre-test _ 2.154 ; Appreciation pre-test - 1.734 40. 796 
2. 334 : 28.080 
1. 280 : ‘ ; 41. 580 
1. 936 ’ . 36. 337 
5% tabular value for 
$4 classes_.....___. 1. 50 : 1 classes ‘ 43.77 


1. 76 e . 50. 89 





JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XII 
RELATION OF PRE-TEST TO CHANGE 


Attitude 
—.49 
—. 40 
—. 38 


Appreciation 
—. 40 
—.29 
—.03 


TABLE XIII 


RELATION OF I. Q. TO CHANGE 


Appreciation Attitude 


.09 . 06 
.09 . 06 
.12 01 


TABLE XIV 


RELATION OF M. A. TO CHANGE 


Appreciation Attitude 
.01 —. 02 
. 03 . 00 
.31 —.02 


[Vol. 14, No. 


Information 
—. 53 
—. 40 
—.31 


Information 
.18 
. 20 
. 85 


Information 
mo —.04 
.13 —.02 
.47 . 06 


untenable is revealed by study of the relation- 
ship between some of the pupil factors and 
pupil gain as indicated by coefficients of cor- 
relation in Tables XII, XIII and XIV: 


Because many of the correlations are small, 
it is essential to know how large they must be 


to be significantly different from zero. Lind- 
quist*? provides a formula and table for deter- 
mining significance at the 1% and 5% levels. 
At the 5% level: 


When N = 200, r is significant if it exceeds . 14. 
When N = 181, r is significant if it exceeds . 15. 
When N = 31, ris significant if it exceeds . 35. 


THE HiGH PreE-TEstT HANDICAP 


Inspection of the above correlations in the 
light of this measure of significance indicates 
that the only significant relationships that need 
to concern us here are those between pre-test 
scores and gains. The relation of pre-test score 
to gain is definite. It is evident that pupils 
starting with low pre-test scores make the 
gteater gains and those starting with high pre- 
test scores make the lesser gains. This relation- 
ship holds not only for individuals in the total 
group but also for the classes, except in the 
case of appreciation. This means that a teacher 
starting with a class having a high pre-test 
score is handicapped, since by virtue of this 
pre-test position alone, the gain will be cor- 
respondingly low; while a teacher starting with 

® Ibid., pp. 210-212. 


a Class having a low mean pre-test score will, 
due to pre-test position alone, have an advan- 
tage, as the gain will be correspondingly high. 


This tendency has been found and noted in 
other studies. Rostker** found negative coefh- 
cients between all pre-test and gain scores, 
ranging in magnitude from —.18 to —.54 in 
eight measures. Rolfe®* likewise found negative 
coefficients between all pre-test and gain scores 
ranging from —.29 to —.67. Von Eschen* 
found coefficients of correlation between pre- 
test and gain ranging from —.37 to —.83. 


A plausible explanation of this phenomenon 
of negative relationship between pre-test and 
gain scores is found in the so-called “ceiling” 
concept—that a test of a given number of items 
limits the student with high pre-test scores 
since he may reach the “ceiling’’ whereas the 
student with the low pre-test scores has ample 
opportunity to make rather extensive gain with- 
out nearing the “ceiling”. Examination of the 
distribution of scores in the tests in the present 
study, indicates that this explanation might 
have some validity in one area tested, namely, 
that of Interest, but not in the other tests. For 
these other areas the “ceiling” concept does 
not appear to apply. 


™ Leon E. Rostker, “‘The Measurement of Teaching Abil- 
ity,” published herewith. 


PO om Rolfe, “The Measurement of Teaching Ability,” 
herewith. 
Clarence R. Von Eschen, “An Evaluation of a Super- 


we er with Seventh- and Eighth-Grade Teachers in 
the State of Wisconsin,” published herewith. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 89 


The learning curve plateau concept as ex- 
planatory of the negative pre-test gain relation- 
ship appears logical if the test used is of the 
power type, but inspection of the test scores in 
this study would hardly permit such an inter- 
pretation. Possibly the tendency for teachers to 
give extra time in coaching pupils with low pre- 
test scores may result in the extra increments. 
Since the tests were not, however, scored by the 
teachers, nor were the individual pupil results 
reported to them, there was no opportunity for 
them to know the pupils with low initial test 
scores.?® 


TREATMENT OF THE HIGH PRE-TEST SCORES 


In this study a special application of the 
multiple regression equation is employed to 
correct for high initial test scores. 

The extent and the direction of the relation 
of certain pupil factors (pre-score, 1.Q., and 
M.A.) to gain score has been shown. On the 
basis of these known relationships, prediction 
of the direction and magnitude of gain was 
made through the use of the multiple regres- 
sion technique. Having predicted the amount of 
gain that may be expected, due to known pupil 
factors, this predicted gain may then be sub- 
tracted from the actual gain to secure a measure 
of the contribution of factors other than those 
included in the regression equation, among 
which the teacher is one. If other factors such 
as the home, radio, health, etc., may be assumed 
to be approximately constant or randomly dis- 
tributed from class to class, the residual gain, 
i.e., the actual gain less the predicted gain, may 
then be attributed to the teacher; the residual 
gain thus becomes an indirect measure of teach- 
ing efficiency. 

The proof of the above treatment appears 
simple: Having found significant correlation 
between I.Q., M.A. and pre-test with change, 
a multiple regression equation can be estab- 
lished which when applied should predict the 
amount of pupil change expected from these 
factors. The known change less the predicted 
change should equal the change that may be 
ascribed to unmeasured factors (teaching plus 
a constant). The check on the accuracy of this 
solution consists of correlating the residual 
scores with the pre-test scores, and examining 
the resulting coefficients. 

In line with the foregoing discussion, four 
criteria of teaching effectiveness based on 


A statement relative to the statistical significance of 
these negative correlations will be issued at a later date. 


changes in pupils in Appreciation, Attitude, 
Information, and Interest as related to Com- 
munity Living were developed. A Composite of 
the four constituted a fifth criterion. 

The relationships, expressed as coefficients of 
correlation, necessary to develop the multiple 
regression coefficients are listed in Table XV. 

The Beta coefficients obtained by the Aitken 
method** are reported in Table XVI. 

The resulting prediction equation for Appre- 
ciation is written:?" 


G, = —.060x, + .498x, — .268x, 
Where G, = predicted gain in appreciation 
: x, ==1.Q. class mean (changed to a 
standard score) 
x, == M.A. class mean (changed to a 
standard score) 
x, == Appreciation pre-test (changed 
to a standard score) 


The prediction equations for gain in Atti- 
tude, Information, and Interest were similarly 
developed. Applied to the class means changed 
to standard scores,** gains for each class in 
each of the four measures were predicted.” 
These predicted gains were then subtracted 
from the observed gains and the differences 
(the residual gains)*° ascribed to the teacher, 
plus factors not measured, assumed to be con- 
stant from class to class. 


Thus four criterion scores were obtained for 
each teacher, based on the four objectives 
measured, so the teachers could be ranked from 
highest to lowest on the basis of class gain in 
appreciation, attitudes, information, or interest. 
A fifth composite criterion score was obtained 
for each teacher by adding the residual gains in 


% G. H. Thomson, The Factorial Analysis of Human Ability, 
(Boston: Houghton Mifflin Company, 1939), pp. 89-95. 
2 Ibid., p. 92. “Scores to be substituted in this regression 
equation must be standard scores.” 
%See Tables A9-Al3, 
in the University Library, 
erted into 


separate scores 
were derived by the standard formulae 


«; the standard scores for the final test scores— 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 


TABLE XV 
COEFFICIENTS OF CORRELATION EMPLOYED IN DEVELOPING MULTIPLE REGRESSION COEFFICIENTS 
(N = 181) 
Appreciation Attitude Information — Interest 
pre-test pre-test 


M.A pre-test pre-test 
.68 . 59 . 62 ; -15 
1.00 . 55 . 53 d .14 
31 —.038 pele b 
02 


06 eee . —_ 48 


TABLE XVI 
BETA COEFFICIENTS 


TABLE XVII 
CRITERIA SCORES FOR EACH TEACHER BASED ON FOUR PUPIL MEASURES AND COMPOSITE 


Apprecia- Atti- Informa- Inter- Compos- 
Teacher tion Rank tude Rank tion Rank est Rank ite Rank 
. 8805 .4661 10 . 1297 . 38734 . 1638 
. 2355 .38653 13 . 2589 . 8308 .1727 
. 6350 . 8750 . 4970 .9181 
. 3763 . 0536 . 4058 .4171 
. 0651 . 4405 . 8568 4.2785 
. 4645 . 1968 . 5751 
. 6970 . 8072 . 9196 
. 1880 . 4162 . 0610 
. 3513 : . 2858 
. 3555 4 .2117 
. 9851 ‘ . 3851 
. 5826 , . 0757 
. 6041 ‘ . 7642 
. 6694 : . 4976 
j : . 5678 , . 2341 
ESR ; ‘ . 0828 ; . 4508 
I " 7 . 7381 } . 6948 
27 j . 2512 
. 2464 
. 3292 
. 0035 
1. 4156 
2.0748 
. 8905 
. 1425 
. 5941 
; . 3484 
i . 3785 . 0559 
—1l. . 8307 .3111 
: 1.2708 . 8119 
— .1315 . 8990 . 6092 


* Teachers number 28, 31, and 39 are omitted to secure homogeneity. 


the fou 
criterio 
which | 
other t 

It h 
ically t 
the ef 
such f 
the co 
and r 
scores 
—.11 
for at 
of the 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


TABLE XVIII 
INTERCORRELATIONS OF CRITERIA SCORES 


#® To save time and space the following abbreviations are used. 
C-app. = Criterion scores in Appreciation 


C-att. — Criterion scores in Attitude 


C-inf. — Criterion scores in Information 


C-int. = Criterion scores in Interest 
C-com. = Composite criterion score 


the four measures.** Table XVII lists the five 
criterion scores for each teacher and the rank 
which each teacher holds with reference to the 
other teachers in each of the measures. 

It has been previously stated that theoret- 
ically these criterion scores should be free from 
the effects of high pre-test scores. Proof that 
such freedom has been established is found in 
the coefficients of correlation between pre-test 
and residual gain (i.e. the teacher criterion 
scores). These correlations were found to be 
—.11 for information, —.11 for interest, .10 
for attitude, and —.23 for appreciation. None 
of these are statistically significant. 


INTER-CRITERIA RELATIONSHIPS 


Since M.A. and L.Q. also entered into the 
regression equations, the criterion scores are 
likewise independent of these factors. Each of 
the four t t criterion scores thus is to be 
a measure of that portion of the class gain that 
may be attributed to the teacher plus other 
factors not measured, but assumed to be con- 
stant from class to class. As has been previously 
noted, the fifth criterion score is a composite 
of the other four. 

Examination of these criterion scores and the 
resulting ranking of the teachers (Table XVII) 
indicates that in three of the measures: appre- 
ciation, attitude, and information the ings 
of the teachers are quite comparable, i.e., a 
teacher ranking low in one measure ranks low 
in the other measures and vice versa. The situa- 
tion in the case of the measures of interest is 
quite different, the rankings showing no such 
conformity to other measures. 

These relationships are more definitely re- 
vealed in the intercorrelations of the various 
criterion scores which are listed in Table XVIII. 

"Henry E. Garrett, “‘Standard scores may be added, sub- 
(New eck: Longmans, Green ont Co, 1987), 179. 


According to McCall:** “Scientific measure- 
ment (of teaching efficiency) is fair only when 
we measure the amount of desirable change 
produced in a pupil (1) by a given teacher . . . 
(2) in a standard time . . . (3) in standard 
pupils . . . and (4) when the measurement is 
complete.” That the criteria herein developed 
approximate McCall’s specifications will doubt- 
less be granted since, (1) the pupils measured 
were taught by just one teacher, (2) the time 
element was the same for all classes, (3) class 
(pupil) differences in 1.Q., M.A., and pre-test 
were controlled statistically, and (4) measure- 
ment in the area taught (community living) 
was complete in that not only information but 
appreciation, attitudes, and interests were 
measured. 


SECTION V 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES AND SUPER- 
VISORY RATINGS 


This study calls for the determination of cti- 
teria of teaching efficiency based upon objec- 
tively determined changes in “oe The pre- 
ceding section has detailed the procedure 
through which such criteria have been deter- 
mined. This section will be devoted to com- 
parisons between the criteria and factors 
measured by teacher tests and rating scales pre- 
viously described. 


INTELLIGENCE OF TEACHERS AS A FACTOR 
RELATED TO TEACHING EFFECTIVENESS 


Among all the factors that have been com- 
pared with criteria of teaching success, intelli- 
gence, as measured by psychological tests, has 


WW. A. McCall, Measurement (New York: Macmillan Co., 
1939), p. 404. 





92 JOURNAL OF EXPERIMENTAL EDUCATION 


appeared in more studies than has any other 
Fm factor. The test used as a ar of 
teacher intelligence in this investigation was 
the American Council Psychological Examina- 
tion for College Freshmen, 1936 Edition. The 
teacher scores, expressed as part and total 
scores, are shown in Table . Comparison 
of these scores with the 1936 norms** for 
freshmen in teachers colleges indicate that, in 
total score, the teachers ranged from about the 
30th percentile to the 99th percentile. In Q,, 
median, and Q, the comparison was as pre- 
sented in Table XIX. 

These teachers, then, are slightly higher in 
intelligence than are freshmen entering teacher 
training institutions. Practically the same condi- 
tion is found in relation to the different parts 
of the test. 

The correlations between the total scores and 
the criterion scores, are given in Table XX. 

“L. L. Thurstone and Thelma G. Thurstone, “The 1936 

ican Council i ination ‘for College 


chological Examination 
om’ Educational Record, XVIII (April, 1937), pp. 


[Vol. 14, No.1 


TABLE XIX 


COMPARISON OF SCORES FOR TEACHERS IN THis 
STUDY AND TEACHERS COLLEGE FRESHMEN 


Freshmen in 
Teachers Teachers in 
Colleges this study 
124 155 
161 189 
235 


TABLE XX 


CORRELATIONS BETWEEN TOTAL SCORES AND 
CRITERION SCORES 


r 
C-att. with total psychological__.........-.- . 39 
C-app. with total psychological_----.....-.- 4 
C-inf. with total psychological ___---.--.-.-. .49 
C-int. with one Sidaih gama ics . 02 
C-com. with total psychological___.....-.--. 8 





N =31 (pairs of scores) 
Significant at 5% level ifr =. -.----..---- . 36 
Significant at 1% levelifr =_--.------- 4 


TABLE XXI 
TEACHERS’ SCORES ON AMERICAN COUNCIL PSYCHOLOGICAL EXAMINATION 


III IV 
46 40 
15 36 
26 
59 


30 


31 
50. 94 
18.14 


37. 55 


‘ 33.74 
18.25 


8.80 


194. 29 
51.38 





|\SRBsts. — 


aa 


September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


It is evident that there is a significant rela- 
tionship between four of the criteria and intel- 
ligence as measured by the above test. Pupil 
change in information shows the highest rela- 
tionship (being significant at the 1% level) and 
changes in interest the lowest. Checking back 
over these tests one finds that the interest test 
had a lower reliability than the other tests, and 
that the mean class gain scores were the small- 
est, which may account, at least in part, for the 
low correlation. It will be recalled that the in- 
terest criterion differed from the other criteria.** 

Having found such significant relationship 
between the intelligence factor and the criteria 
of teaching effectiveness, it seemed advisable to 
compare the various criteria with the different 
parts of the —— test to discover which 

contributed most to the relationship. 

ese correlations are reported in Table XXII. 

It is — that certain parts of the psy- 
chological test correlate more highly with the 
various criteria than do other parts. Completion 
(Part I), artificial language (Part III), and 
analogies (Part IV) apparently contribute more 
to the total relationship found than do arith- 
metic (Part II) and opposites (Part V). Again 
it is evident that the psychological examination 
is not related significantly to the Interest cri- 
terion. 

Testing the practicability of applying the 
principle that a number of tests or subtests, 
significantly related to a criterion and not 

® See Section III. 


93 


highly inter-related, form the most reliable 
predictive device, the intercorrelations between 
the teachers’ scores on the different parts of the 
psychological test were calculated (Table 
XXIII) . 

These intercorrelations, being quite similar in 
size to those usually reported for parts of the 
— examination, indicated the possi- 

ility of forming a composite of the parts of 
the psychological test that were significantly re- 
lated to the criterion (Parts I, III, and IV) and 
comparing this composite with the criterion. 
Likewise, the individual criteria showing great- 
est consistency (C-app., C-att., and Cink ) were 
composited into a new composite criterion. 
Thus comparisons could be made between a 
psychological examination composited from 
three of its five parts: Psych. I, III, and IV, 
and the two composite criteria: C,-com. (com- 
of all four criteria) and C,-com. (com- 
posed of C-app. + C-att. + Cinf.). The 
following coefficients of correlation were 
found: 
r 
Psych. I, Ill, and IV with C,-com._-.---- 52 
Psych. I, III, and IV with C,-com._...--- 55 


Thus by omitting two parts of the psycho- 
logical examination, its validity as a measure 
of teaching efficiency with C,-com. as the cri- 
terion was increased to .52; and with C,-com. 
as a criterion it was increased to .55.%¢ 


2% 40 items out of 212 or 18.7% were found to be valid 
(significant at 5% level). 


TABLE XXII 
CORRELATIONS FOR PARTS OF PSYCHOLOGICAL TEST AND THE CRITERIA OF TEACHING EFFICIENCY 


Part I 
(Completion) 
a .16 
28 
. 25 
.07 
.24 


* Significant at 5% level. 
** Significant at 1% level. 


Part II 
(Arithmetic) 


Part III Part IV Part V 
(Language) (Analogies) (Opposites) 
.41* 24 
. 46** 
.43* 


.14 
. 49** 


TABLE XXIII 
INTERCORRELATIONS BETWEEN DIFFERENT PARTS OF THE PSYCHOLOGICAL EXAMINATION 


I Completion 
II Arithmetic 


II Ill ‘V 
. 58 
. 28 
. 46 





JOURNAL OF EXPERIMENTAL EDUCATION 


That this is an unusually high correlation 
between a measure of intelligence and teaching 
efficiency is seen in comparisons with results of 
other in ions. rs,3* Boardman,** 
Phillips,** Ullman,“ Knight,“ Whitney,“ and 
Odenweller respectively, coefficients 
of correlation of .43, .33, .26, .15, .09, .03, 
and .00 between the intelligence measures used 
and teacher effectiveness based upon super- 
visory ratings. 

Using the objective criterion of pupil change, 
Armstrong,** found a correlation of .20; Barr, 
Torgerson, et al.*® of —.19, and Rostker** of 
.48. The present study substantiates the findings 
of Rostker that intelligence, as measured by the 


[Vol. 14, No.1 


preted as interest in the profession of te 
Reports of investigators show little consistency 
as to the interest factor. Using expert judgment 
as the criterion of teacher success, Kriner,*" re. 
interest an important factor, while Ul. 

man** and Phillips rt practically no rela. 
tionship. With pupil ge as the criterion, 
Barr*° and others report no relationship for the 
Strong Vocational Interest Blank. Rostker® re. 
ports, however, a significant relationship 
and Rolfe a small positive relationship, both 
using the Yeager Test. 

The “payne # found in this investigation 
expressed as coefficients of correlation are re- 
ported in Table XXIV: 


TABLE XXIV 


CORRELATIONS BETWEEN SCORES ON THE YEAGER TEST OF INTEREST IN 
TEACHING AND THE SEVERAL CRITERIA 


N = 31; These correlations are significant at 5% 
American Council Psychological Examination, 

appears significantly related to teaching efh- 

ciency, at least as herein defined. 


ATTITUDE OF TEACHERS TOWARD TEACHING 
AS A FACTOR RELATED TO TEACHING 
EFFICIENCY 


Attitude toward teachers and teaching as 
measured by the Yeager test may also be inter- 


Bed, T. Somers, Pedagogical Prognosis: Predicting the Suc- 
‘ Contributions to Education, No. 
Publications, Teachers College, 


, Se Tests as a Measure of 

cE in High Schools, Contributions to Educa- 

tion, No, 327 (New York: Bureau "of Publications, Teachers 
Coll umbia bey yr —~/ 192 ). 

Phillips, An Analysis of C of Certain Characteristics of 
scine ‘a Prospective 's, Contributions to Education, 
No. 161 > Tennessee: Peabody College for 
bys = 1935). 


R. Ullman, The Prognostic Value of Casey Factors 
Related to Teaching Success (Ashland, Ohio: A. L. Garber 


ities aie to Success in 
tion, No. 120 (New York: 
of Publications, Teachers College, Columbia University, 


1922 
oy. ty Whitney, The Postion of Teaching Success, 


Research M me, No. 6 (Bloom- 
ene! at | a Public School — 1924). 
# Arthur L. Odenweller 


No. “6t6 ‘ot “4 Teach. 





Teac 

ie ork: Macmillan Co., 1935), wet 73-141. 
“ Rostker, The Measurement and Prediction 

tone Ae Unpublished Ph.D. Thesis, University of 


level ifr =. 


None of these correlations are statistically 
significant; the correlation of scores on the 
Yeager Test with a composite of the criterion 
scores gave a correlation of .16 which is not 
statistically significant. 


KNOWLEDGE OF THE. THEORY AND PRACTICE 
OF MENTAL HYGIENE AS A FACTOR 
IN TEACHING EFFICIENCY 


In the present study the teachers’ knowledge 
of the theory and practice of mental hygiene as 
measured by the Torgerson test was correlated 
against criteria of pupil change (Table XXV). 


The correlations are low, except that with 
appreciation which reaches significance at the 
5% level. They are all positive except with the 
interest criterion. The correlation with a com- 
posite of these criterion scores was .24. In view 

*H. L. Kriner, Pre-Traiming Factors Predictive of Teacher 

Pennsylvania State Studies in Education, No. 1 
wpe: Pennsylvania: rere State College, 


Success (Ashland, Ohio: A. L. Garber 


»). 
Phillips, An Anal: of Certain Characteristics of 
Active “dee Prospective Teac. tributions to Education, 
Be. 161 eT Temnamee:” eabody College for 
..S. Barr, W. H. Burton, and L. J. 

vision (New York, D. i 0 

SL. E. Rostker, “ Measurement and Prediction of 
Ability,” ‘School and Society, LI (January, 1940), 


mars 
PP. 31-32. 

ro pent etaly, the relationship was negative. 
i Rolle Pp. 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 95 


TABLE XXV 
CORRELATIONS BETWEEN SCORES OF MENTAL HYGIENE AND CRITERIA OF TEACHING EFFICIENCY 


Knowledge of Mental Hygiene with 


Significant at 5% level if r = . 


CORRELATIONS BETWEEN Parts OF HARNLY TEST AND CRITERIA OF TEACHING EFFICIENCY 


Educational Purposes 
Educational Policies 
Educational Objectives 
Educational Methods 
Educational Total 


N = 20;§Significant at 5% level if r 


of the evidence presented, the validity of the 
test as a measure of teaching efficiency is low 
(significant at about the 7% level). 


RELATION OF EDUCATIONAL CONSERVATISM 
AND PROGRESSIVISM TO TEACHING 
EFFICIENCY 


No mention has been found in previous in- 
vestigations of the factor of conservatism. As 
has been pointed out in the description of the 
Harnly test,°* used to measure this factor, the 
teacher’s conservatism or progressivism with 
reference to educational purposes, policies, 
objectives, and methods was measured. The 
relationships to the criteria with the r’s reflected 
so that positive r’s show a liberal position and 
negative r’s a conservative position are given in 
Table XXVI. 

Evidently whether a teacher is liberal or con- 
servative with reference to educational pur- 
poses, policies, and objectives makes little 
difference with pupil changes in attitudes, 
appreciations, information, and interests as 
measured in this study. The correlations be- 
tween these criteria and educational methods are 
somewhat higher. Since all of these correlations 
are negative, it appears that the better teachers 
tend to be more conservative than poorer 
teachers. 


RELATION OF SOCIAL PROFICIENCY OF THE 
TEACHERS TO TEACHER EFFICIENCY 


Various personality factors have been studied 
in relation to teaching efficiency. Social intelli- 
gence as measured by the Social Intelligence 

% See Section III of this report. 


C-app. C-inf. 
. 08 . 33 
. 03 .16 
.09 .19 

—. 38 —.14 
. 04 .16 


C-int. 
. 06 
.03 

—. 06 

—.23 

—.10 


Test by Moss and others was found insignifi- 
cantly related by Uliman®® and Barr,®* with 
coefhicients of correlation of .18 and .19 re- 
spectively. Personality, subjectively estimated, 
was found significantly related by Somers** and 
Odenweller,®* with coefficients of correlation of 
.62 and .83 respectively when the criterion was 
determined by subjective expert judgment. In 
the latter comparison, the judgments of per- 
sonality and teaching efficiency were made by 
the same supervising officers. 

Social proficiency as here reported was 
measured by the Jackson’s “Social Proficiency 
Test”. The Relationship between this test and 
teaching efficiency is indicated in Table XXVI1. 

Interpreted broadly, the above results mean 
that the teacher who secures the greatest desir- 
able change in pupil’s attitudes, a 
information, and interests is inclined to be less 
considerate of others as here measured than is 
the teacher who is less effective. 


THE VALIDITY OF SUPERVISORY RATINGS AS 
MEASURES OF TEACHING EFFICIENCY 


Through the cooperation of the county super- 
intendent and supervising teachers having 
supervisory jurisdiction over the teachers who 

SR. R. Ullman, The Prognostic Value of Certain Factors 
ag Teaching Success (Ashland, Ohio: A. L. Garber 

% A. S. Barr; T. L. Torgerson, and others, “The Validity 
d Certain Instruments Emplo: in the mere Bm wo 

eac 


‘eaching »” found in The Measurement o, 
Efficiency (New York: Macmillan Co., 1935), pp. 73-141. 

P. Pri is: Predicting the Suc- 

tive Teachers, Contributions to Education, No. 

of Publications, Teachers College, 


SA. L. Odenweller, the Ti 3 
contin cle None Yon! eet 
Publications, Teachers College, Columbia University, 1936). 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE XXVII 


CORRELATIONS BETWEEN SCORES ON JACKSON SOCIAL PROFICIENCY TEST AND 
CRITERIA OF TEACHING EFFICIENCY 


= 20 
Menitieunt at 5% level if r 


TABLE XXVIII 
TEACHER RATINGS BY COUNTY SUPERINTENDENT AND SUPERVISING TEACHER 


County Su 
Torgerson Michi 
Scale Scale 

610 

330 

457 

340 

684 

675 

601 

528 

555 

605 

673 

591 

573 

636 


10.77 109. 82 


rintendent 
Almy-S. 
Scale 


Supervising gaa 
Torgerson Michigan Almy-S. 
Scale Scale 4 
153 
124 
182 
147 
154 
119 
138 
118 
146 
120 
178 
144 
153 
141 
159 
151 
122 
180 
112 
152 
102 
128 
68 
111 


737 

370 7 
564 111 
394 107 
639 157 
733 154 
684 187 
543 105 
619 147 
525 86 
725 177 
631 144 
458 

331 

538 


- 154 


103 

170 

150 

177 
84 457 
30 


136. 37 
26. 58 


58. 63 


13. 87 126. 40 


* Teachers number 28, 31, and 39 are omitted to secure homogeneity. 


participated in this study, it was possible to 
secure three separate ratings on each teacher 
from each of two raters. These ratings, based 
on the Torgerson, Michigan, and Almy-— 
Sorenson teacher rating scales are reported in 
Table XXVHI. 


Since in the analysis of these scores, com- 
posites of the three separate ratings by the 


superintendent and of the three ratings by the 
supervising teacher were desired, the raw scores 
were changed to standard or sigma scores* 
and all calculations and comparisons were made 
from these standard scores. 
sp Score — Mean Score _ «4, 





score. “Standard scores 
y be added, = age or averaged.” Henry E. Garrett, 


Statistics in Psychology and Educaiion, Second Edition (New 
York: Longmans, Green and Co., 1937). 





September, 1945] THE MEASUREMENT OF TEACHING ABILITY 


The primary purpose of securing supervisory 
ratings was to compare such ratings with the 
objectively determined pupil change criteria. 

If the superintendents’ three ratings of the 
teachers intercorrelated highly, then a composite 
of the three could be compared with the criteria 
of teaching. The same logic applies to the 
supervisory teachers’ ratings. The intercorrela- 
tions indicated were calculated (Table 
XXIX).% 


TABLE XXIX 


INTERCORRELATIONS BETWEEN SUPERVISORY 
RATINGS 


Supervising 
Co. Supt. Teacher 
Almy-Sorenson with 
Michigan ‘ . 73 
Almy-Sorenson with 
Torgerson P . 88 
Michigan with Tor- - 


N =31 : 
Significant at 1% levelifr =. 46 


Correlations between each of the ratings and 
each of the criteria were computed (Table 
XXX): 


97 


If pupil change as here measured is accepted 
as a valid criterion of teaching efficiency then 
the supervision ratings here provided are 
invalid. 

The correlations for composites of the 
superintendents’ and supervising teachers’ 
ratings are reported in Table XXXI. 

These results substantiate the above conclu- 
sion with reference to supervision ratings. 

Changing the point of emphasis for the time 
being, and thinking of the supervisory ratings 
as the criterion of teaching bility, the other 
teacher measures were compared with this cri- 
terion. These correlations are reported in Table 
XXXII. 

The results indicate clearly that there is no 
significant relationship between any of the 
teacher measures used in this investigation, and 
teaching efficiency as measured by supervisory 
ratings. The results on the Harnly test of liberal 
and conservative viewpoint, approach signifi- 
cance. 

Table XXXIII summarizes the findings pre- 
sented in this section concerning the relation 
of various teacher factors to the criteria of 
teaching efficiency and the statistical validity of 


TABLE XXX 
CORRELATIONS BETWEEN SUPERVISORY RATINGS AND CRITERIA OF TEACHING EFFICIENCY * 


Superintendents’ Ratings 
Almy-S. Michigan Torgerson 
—.05 


Supervising Teachers’ Rating 
Almy-S. ichigan Torgerson 
—.29 ‘ —.23 
—.22 —.27 —. 


Significant at 5% level ifr = .36 


TABLE XXXI 


CORRELATIONS BETWEEN COMPOSITE OF SUPER- 
VIsORY RATINGS AND CRITERIA OF 
TEACHING EFFICIENCY 


Composite Composite 
Supt. Sup. 
Rating Teachers’ 


Since these ratings were expressed as standard scores, the 
ind- 


following correlation formula was used: (From E. F. Lind 
quist, A First Course in Statistics, p. 150). 


=" 27 


xy WV 


—.27 

.04 
—.05 
—.14 


.10 —.02 —. 
. 04 —.19 _. 
. 00 —.24 — 


the instruments used to secure a measure of 
these factors. 

It is evident from these data that the statis- 
tical validity of only one of the measures, 
namely the American Council Psychological 
Examination, meets the statistical significance 
essential to an instrument useful for predictive 
purposes. How effective this test would be for 
predictive purposes is shown by comparing the 
upper and lower fourths of the teachers selected 
on the basis of this test (Psych. I, III, and IV) 
with the upper and lower fourths of the cri- 
terion group (C-com.-3). On the basis of 
chance, the upper fourth on the test would be 
found distributed equally in the four quarters 
of the criterion group. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.1 


TABLE XXXII 


CORRELATIONS BETWEEN CERTAIN TEACHER MEASURES AND COMPOSITES 
OF SUPERVISORY RATINGS 


In mce (American Council Psych.)* 
Kew ge of Mental Hygiene (Torgerson) * 
Attitude toward Teaching (Yeager)* 
a. 


*N = 30; Significant at 5% level ifr = .36 
**N = 20; Significant at 5% level ifr =. 44 


Com 

Supt. 
—.12 
Bj 
—.15 
. 39 
—.89 
.01 


ite’ Composite Sy 
ting Teacher Rating 


TABLE XXXIII 


STATISTICAL VALIDITY OF TEACHER MEASURES WHEN CRITERION IS 
DESIRABLE CHANGE IN PUPILS 


Measures 
American Council Psychological 
Part 1 Completion 
Part 2 Arithmetic 


pposi 
Composite of Parts 1, 3 and 4 
—e of Mental Hygiene (Torgerson) 
oward — (Yeager) 
Social Proficiency (Jackso 
Liberal (progressive Viewpoint (Harnly) 
As to Educational Purposes 
As to Educational Policies 
As to Educational Objectives 
As to Educational Method 


Statistical Validity 
C-com.(4)a 
. 425* 

. 319 

. 240 

. 476** 

. 485** 
64 


C-com.(3)b 


a Criterion composited from C-app., C-att., C-inf., and’C-int. 
b Criterion composited from C-—app., C-att., C-inf. 


* Statistically significant at 5% level. 
** Statistically significant at 17% level. 
*** The sign has been reflected. 


Comparing upper and lower fourths, the 


following results were obtained: 


ae of ve ve fourth on_test found in 
rth of criterion 

% 6 upper fourth on test found in 
lower fourth of criterion 

% of lower fourth on test found in 
lower fourt#$ of criterion 

% of lower fourth on test found in 
upper fourth of criterion 


121% 


How does this compare with other instru- 
ments used for predictive purposes? Appropos 
intelligence tests and prediction, Freeman* 
says: 

“The correlation between intelligence 
tests and composite standing of pupils may 
be said, then, to lie usually between .40 and 


1 Frank N. Freeman, Mental Tests (Boston: 


Houghton 
Mifflin Company, 1926), p. 372. 


.60. Probably in the majority of cases the 
correlations will be found in the neighbor- 
hood of .50, but under very favorable con- 
ditions, it may be somewhat above this. The 
practical meaning of this correlation is that 
it enables us with moderate degree of accu- 
racy to predict the grade of work which a 
student will do in school or college.” 


In recent years colleges and universities have 
been using “aptitude” or “scholarship” tests for 
purposes of predicting college grade point 
average. In his study of such prediction devices 
used at the University of Wisconsin, Froelich” 
found the validity of the Wisconsin Achieve- 
ment Test as an instrument for prediction of 
scholastic success for first semester freshmen 


© Gustav J. Froelich, The Validity of the Wisconsin 
Achievement Test as an Instrument for Prediction of Scho- 
lastic Success at the University of Wisconsin, unpublished 
ag thesis (Madison, Wis.: University of Wisconsin, 1940), 
Pp. k 





September,1945| THE MEASUREMENT OF TEACHING ABILITY 


TABLE XXXIV 
INTERCORRELATIONS AMONG TEACHER MEASURES AND THE CRITERION 


Xo Criterion (pupil change) , 

X, American Council Psychological 
— I, yt & a — 

X, Torgerson Men ygiene 

X, Jackson Social Proficiency 

X, Harnl ” eseeaes Toward Method 
Part 


Xi Xe Xs 
. 3815 —. 380 


. 093 


*The sign of this r has been reflected; a positive r here signifies that conservative teachers secured 


higher pupil change scores. 


to be .61, while the validities of secondary 
school rank, American Council Psychological 
Examination and the Henmon—Nelson Test, 
were .62, .55, and .48 respectively. From his 
observation that such a test is ‘‘as good a 
measure of umiversity success as any other 
available measure,” it is apparent that predic- 
tive measures of really high validity are diffi- 
cult to develop, and that the relationship found 
in this study between parts of the American 
Psychological Examination and the criterion of 
teaching efficiency is relatively high. 

In a further attempt to discover what meas- 
ures might be used to give the best measure of 
teaching efficiency and how to combine these 
to give the highest correlation with the crite- 
rion, four of the measures that gave the highest 
zero order correlations were combined in a re- 
gression equation. The intercorrelations from 
which this regression equation was calculated is 
given in Table XXXIV. 


The beta coefficients obtained by the Aitken 
method® are as follows: 


Bo == +539 Bos = —-405 
Boe == +291 Bos == --329 


The regression equation in standard score 
form is therefore: 


Z, == .539z, + .291z, — .405z, + .329z, 


The multiple correlation can be immediately 
calculated as follows: 


R = \/.539(.550) + .291(.315) + 
(—.405) (—.380) + .329(.318) — .804%« 








This multiple R represents considerable gain 
over that reported by Rostker and Rolfe, in 


Charles C. Peters and Walter R. VanVoorhis, Statistical 


Procedures and Their Mathematical Basis ( 
McGraw-Hill Book Co., 1940), p. 227. 

®* The author is indebted to Dr. A. S. Barr, Ronald D. 
jones, and L. Joseph Lins for this calculation. It will be 
ones, and L. Josepn Lins for this calculation. 


New York: 


that four measures here give a better correla- 
tion than that secured by Rolfe for eleven 
measures and equal to the correlation secured 
by Rostker from ten measures. For fourteen 
measures Rostker secured a multiple R of .85. 


SECTION VI 


SUMMARY, CONCLUSIONS, AND 
LIMITATIONS 


PURPOSE 


The purpose of this study was to determine 
the validity of certain teacher tests and rating 
scales as measures of teaching efficiency when 
pupil change is employed as the criterion. 

From the results of this investigation, the 
following generalizations are offered: 


1. Valid criteria of teaching efficiency based 
upon objectively determined pupil change in 
different aspects of the various subject areas, 
may be determined only with difficulty. The 
validity of the criteria will be limited by the 
validity and reliability of the pupil tests used. 
As better instruments for measuring pupil 
change are constructed, including reactions 
other than those that may be registered through 
paper and pencil tests, better measurement of 
teaching may result. 

It should here be emphasized that the cri- 
teria of teaching efficiency objectively deter- 
mined in this study might be more appropri- 
ately labeled “criterion of teaching apprecia- 
tions as related to Community Living,” “ctite- 
rion of teaching information related to Com- 
munity Living,” etc., since but one area of 
teaching is involved. Since there is no evidence 
to show that teaching efficiency in the area 
studied is directly related to efficiency in other 
areas, no such inferences are drawn, with refer- 
ence to these areas. 





100 


¥2. Intelligence of teachers as measured by 
the total score and part scores on the American 
Council Psychological Examination is signifi- 
cantly related to teaching efficiency as measured 
here (.61). 

3. Professional knowledge of the theory and 
practice of mental hygiene is positively but not 
significantly related to teaching efficiency (.35). 

4. Whether a teacher is liberal or conserva- 
tive. with reference to educational objectives, 

urposes, and policies, or mot seems to make 

Fite nate © her efficiency. There is a 
tendency for the efficient teacher to be con- 
servative in her teaching methods (—.32). 

5. The teacher's attitude toward her profes- 
sion or toward her fellow teachers as herein 
measured showed little relationship to her 
efficiency (.16). 

6. Teachers who are the more considerate of 
others as here measured tend to be inefficient, 
although the relationship is not statistically 
significant (—.35). 

7. Ratings of teaching efficiency by superin- 
tendents and supervising teachers do not agree 
with the criterion of pupil gain. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No, 


8. The use of different rating scales by ¢ 
same rater on the same teachers results in comm 
siderable difference in the teacher ranking. 


RECOMMENDATIONS 


The outcomes of this study make the generg 
problem of the measurement of teaching eff 
ciency more challenging than before. The tedi 
nique of securing pupil change attributable 
the teacher has been somewhat clarified 
simplified. The principal weakness of the study 
lay in the fact that pupil change, and thereforg 
teaching efficiency was determined for but § 
small part of the complete school experiengg 
of the pupils. ; 

Future studies of this type should extend thet 
criteria to include all of the pupil's school actively 
ities. Using the findings of this and similapy 
studies, investigators should construct measure 
which might more nearly measure factosg 
accompanying or paralleling teacher success. 4 


[The remainder of the studies in this serieg: 
will be reported in the December issue. } 








